Learning Objectives

Python Overview

In R I Want In Python I Use
Base R numpy
dplyr/tidyr pandas
ggplot2 matplotlib/seaborn

Import Matplotlib and Seaborn, and Load Dataset

R

library(ggplot2)
data("mpg")

All other code will be Python unless otherwise marked.

import matplotlib.pyplot as plt # base plotting functionality
import seaborn as sns           # Original interface
import seaborn.objects as so    # ggplot2-like interface
mpg = r.mpg

Basics

One Quantitative Variable: Histogram

One Categorical Variable: Barplot

Dodging Barplots

One Quantitative Variable, One Categorical Variable: Boxplot

Two Quantitative Variables: Scatterplot

Faceting

Customizing Look

Exercises

Consider the palmer penguins data, which you can load via

penguins = sns.load_dataset("penguins")
penguins.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 344 entries, 0 to 343
## Data columns (total 7 columns):
##  #   Column             Non-Null Count  Dtype  
## ---  ------             --------------  -----  
##  0   species            344 non-null    object 
##  1   island             344 non-null    object 
##  2   bill_length_mm     342 non-null    float64
##  3   bill_depth_mm      342 non-null    float64
##  4   flipper_length_mm  342 non-null    float64
##  5   body_mass_g        342 non-null    float64
##  6   sex                333 non-null    object 
## dtypes: float64(4), object(3)
## memory usage: 18.9+ KB
  1. Make a visualization of bill length versus bill depth, annotated by species.

  2. Add OLS lines to for each species to the same plot object you created in part 1 (don’t rerun so.Plot()).

  3. Use pandas.cut() to convert body mass into five equally spaced levels.

  4. Facet your plot from part 2 by the above transformation. You will have to redo the object since we are using a different data frame here.

  5. Make a visualization for the number of each species in the dataset. Make sure you have good labels.