Getting Started


  • You can launch JupyterLab from the command line or from Anaconda Navigator.
  • You can use a JupyterLab notebook to edit and run Python.
  • Notebooks can include both code and markdown (text) cells.

Data visualization with Pandas and Matplotlib


  • Load your required libraries into Python and use common nicknames
  • Use pandas to load your data –pd.read_csv()– and to explore it –.head(), .tail(), and .info() methods.
  • The .plot() method on your DataFrame is a good plotting starting point.
  • Matplotlib allows you to customize every aspect of your plot. Start with the plt.subplots() function to create a figure object and the number of axes (or subplots) you need.
  • Export plots to a file using the .savefig() method.

Exploring and understanding data


  • pandas DataFrames carry many methods that can help you explore the properties and distribution of data.
  • Using the help function, reading error messages, and asking for help are all good strategies when things go wrong.
  • The type of an object determines what kinds of operations you can perform on and with it.
  • Python evaluates expressions in a line one by one before assigning the final result to a variable.

Indexing, Slicing and Subsetting DataFrames


  • In Python, portions of data can be accessed using indices, slices, column headings, and condition-based subsetting.
  • Python uses 0-based indexing, in which the first element in a list, tuple or any other data structure has an index of 0.
  • Pandas enables common data exploration steps such as data indexing, slicing and conditional subsetting.

Combining DataFrames


  • Pandas’ merge and concat can be used to combine subsets of a DataFrame, or even data from different files.
  • join function combines DataFrames based on index or column.
  • Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
  • to_csv can be used to write out DataFrames in CSV format.

Data Workflows and Automation


  • Loops help automate repetitive tasks over sets of items.
  • Loops combined with functions provide a way to process data more efficiently than we could by hand.
  • Conditional statements enable execution of different operations on different data.
  • Functions enable code reuse.