Getting Started
- You can launch JupyterLab from the command line or from Anaconda Navigator.
- You can use a JupyterLab notebook to edit and run Python.
- Notebooks can include both code and markdown (text) cells.
Data visualization with Pandas and Matplotlib
- Load your required libraries into Python and use common nicknames
- Use pandas to load your data –
pd.read_csv()– and to explore it –.head(),.tail(), and.info()methods. - The
.plot()method on your DataFrame is a good plotting starting point. - Matplotlib allows you to customize every aspect of your plot. Start
with the
plt.subplots()function to create a figure object and the number of axes (or subplots) you need. - Export plots to a file using the
.savefig()method.
Exploring and understanding data
- pandas DataFrames carry many methods that can help you explore the properties and distribution of data.
- Using the
helpfunction, reading error messages, and asking for help are all good strategies when things go wrong. - The type of an object determines what kinds of operations you can perform on and with it.
- Python evaluates expressions in a line one by one before assigning the final result to a variable.
Indexing, Slicing and Subsetting DataFrames
- In Python, portions of data can be accessed using indices, slices, column headings, and condition-based subsetting.
- Python uses 0-based indexing, in which the first element in a list, tuple or any other data structure has an index of 0.
- Pandas enables common data exploration steps such as data indexing, slicing and conditional subsetting.
Combining DataFrames
- Pandas’
mergeandconcatcan be used to combine subsets of a DataFrame, or even data from different files. -
joinfunction combines DataFrames based on index or column. - Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
-
to_csvcan be used to write out DataFrames in CSV format.
Data Workflows and Automation
- Loops help automate repetitive tasks over sets of items.
- Loops combined with functions provide a way to process data more efficiently than we could by hand.
- Conditional statements enable execution of different operations on different data.
- Functions enable code reuse.