library(ggplot2)
data("mpg")
Plotting with Matplotlib and Seaborn
Learning Objectives
- Data visualization with seaborn and matplotlib
- Chapter 4 of Python Data Science Handbook.
Python Overview
In R I Want | In Python I Use |
---|---|
Base R | numpy |
dplyr/tidyr | pandas |
ggplot2 | matplotlib/seaborn |
Import Matplotlib and Seaborn, and Load Dataset
R
All other code will be Python unless otherwise marked.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
= r.mpg mpg
Show and clear plots.
Use
plt.show()
to display a plot.Use
plt.clf()
to clear a figure when making a new plot.
One Quantitative Variable: Histogram
sns.histplot()
makes a histogram.='hwy', data=mpg) sns.histplot(x plt.show()
plt.clf()
One Categorical Variable: Barplot
Use
sns.countplot()
to make a barplot to look at the distribution of a categorical variable:='class', data=mpg) sns.countplot(x plt.show()
plt.clf()
One Quantitative Variable, One Categorical Variable: Boxplot
Use
sns.boxplot()
to make boxplots:='class', y='hwy', data=mpg) sns.boxplot(x plt.show()
plt.clf()
Two Quantitative Variables: Scatterplot
Use
sns.scatterplot()
to make a basic scatterplot.='displ', y='hwy', data=mpg) sns.scatterplot(x plt.show()
plt.clf()
Lines/Smoothers
Use
sns.regplot()
to make a scatterplot with a regression line or a loess smoother.Regression line with 95% Confidence interval
='displ', y='hwy', data=mpg) sns.regplot(x plt.show()
plt.clf()
Loess smoother with confidence interval removed.
='displ', y='hwy', data=mpg, lowess=True, ci='None') sns.regplot(x plt.show()
plt.clf()
Annotating by Third Variable
Use the
hue
orstyle
arguments to annotate by a categorical variable:='displ', y='hwy', hue='class', data=mpg) sns.scatterplot(x plt.show()
plt.clf()
='displ', y='hwy', style='class', data=mpg) sns.scatterplot(x plt.show()
plt.clf()
Use the
hue
orsize
arguments to annotate by a quantitative variable:='cty', y='hwy', hue='displ', data=mpg) sns.scatterplot(x plt.show()
plt.clf()
='cty', y='hwy', size='displ', data=mpg) sns.scatterplot(x plt.show()
plt.clf()
Two Categorical Variables: Mosaic Plot
Usually, you should just show a table of proportions when you have two categorical variables.
'class'], mpg['drv'], normalize='all') pd.crosstab(mpg[
drv 4 f r class 2seater 0.000000 0.000000 0.021368 compact 0.051282 0.149573 0.000000 midsize 0.012821 0.162393 0.000000 minivan 0.000000 0.047009 0.000000 pickup 0.141026 0.000000 0.000000 subcompact 0.017094 0.094017 0.038462 suv 0.217949 0.000000 0.047009
'class'], mpg['drv'], normalize='index') pd.crosstab(mpg[
drv 4 f r class 2seater 0.000000 0.000000 1.000000 compact 0.255319 0.744681 0.000000 midsize 0.073171 0.926829 0.000000 minivan 0.000000 1.000000 0.000000 pickup 1.000000 0.000000 0.000000 subcompact 0.114286 0.628571 0.257143 suv 0.822581 0.000000 0.177419
'class'], mpg['drv'], normalize='columns') pd.crosstab(mpg[
drv 4 f r class 2seater 0.000000 0.000000 0.20 compact 0.116505 0.330189 0.00 midsize 0.029126 0.358491 0.00 minivan 0.000000 0.103774 0.00 pickup 0.320388 0.000000 0.00 subcompact 0.038835 0.207547 0.36 suv 0.495146 0.000000 0.44
Facets
Use
sns.FacetGrid()
followed by themap_dataframe()
method to plot facets. You pass arguments to the plot (sns.histplot()
orsns.scatterplot()
etc) inside the map function.= sns.FacetGrid(data=mpg, row='drv', col='class') g = g.map_dataframe(sns.histplot, x = 'hwy', kde=False) g plt.show()
plt.clf()
Labels
Assign plot to an object. Then use the
set_*()
methods to add labels.= sns.scatterplot(x='displ', y='hwy', data=mpg) scatter 'Displacement') scatter.set_xlabel('Highway') scatter.set_ylabel('Highway versus Displacement') scatter.set_title( plt.show()
Saving Figures
First, assign a figure to an object.
= sns.scatterplot(x='displ', y='hwy', data=mpg) scatter
Extract the figure. Assign this to an object.
= scatter.get_figure() fig
Save the figure.
'./scatter.pdf') fig.savefig(
You can do all of these steps using piping.
='displ', y='hwy', data=mpg) \ sns.scatterplot(x\ .get_figure() './scatter.pdf') .savefig(