Learning Objectives

Python Overview

In R I Want In Python I Use
Base R numpy
dplyr/tidyr pandas
ggplot2 matplotlib/seaborn

Import Matplotlib and Seaborn, and Load Dataset

R

library(ggplot2)
data("mpg")

All other code will be Python unless otherwise marked.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
mpg = r.mpg

Show and clear plots.

One Quantitative Variable: Histogram

One Categorical Variable: Barplot

One Quantitative Variable, One Categorical Variable: Boxplot

Two Quantitative Variables: Scatterplot

Lines/Smoothers

  • Use sns.regplot() to make a scatterplot with a regression line or a loess smoother.

  • Regression line with 95% Confidence interval

    sns.regplot(x='displ', y='hwy', data=mpg)
    plt.show()

    plt.clf()

  • Loess smoother with confidence interval removed.

    sns.regplot(x='displ', y='hwy', data=mpg, lowess=True, ci='None')
    plt.show()

    plt.clf()

Annotating by Third Variable

  • Use the hue or style arguments to annotate by a categorical variable:

    sns.scatterplot(x='displ', y='hwy', hue='class', data=mpg)
    plt.show()

    plt.clf()

    sns.scatterplot(x='displ', y='hwy', style='class', data=mpg)
    plt.show()

    plt.clf()

  • Use the hue or size arguments to annotate by a quantitative variable:

    sns.scatterplot(x='cty', y='hwy', hue='displ', data=mpg)
    plt.show()

    plt.clf()

    sns.scatterplot(x='cty', y='hwy', size='displ', data=mpg)
    plt.show()

    plt.clf()

Two Categorical Variables: Mosaic Plot

Facets

Labels

Saving Figures

  1. First, assign a figure to an object.

    scatter = sns.scatterplot(x='displ', y='hwy', data=mpg)
  2. Extract the figure. Assign this to an object.

    fig = scatter.get_figure()
  3. Save the figure.

    fig.savefig('./scatter.pdf')