In R I Want | In Python I Use |
---|---|
Base R | numpy |
dplyr/tidyr | pandas |
ggplot2 | matplotlib/seaborn |
Let’s import the numpy package:
Python
import numpy as np
In Python, we assign variables with =
, not
<-
Python
x = 10
x
## 10
The arithmetic operations (+
, -
,
*
, /
) are the exact same
Python
x * 2
x + 2
x / 2
x - 2
x ** 2 # square
x % 2 # remainder
Comments also begin with a #
:
Python
# This is a comment
Help files are called the same way
Python
help(min)
?min
Python lists are like R lists, in that they can have the
different types. You create Python lists with brackets
[]
Python
x = ["hello", 1, True]
x
## ['hello', 1, True]
NumPy Arrays are the Python equivalent to R vectors (where each
element is the same type). You use the array()
method of
the numpy package to create a numpy array (note that you give it a list
as input)
Python
vec = np.array([2, 3, 5, 1])
vec
## array([2, 3, 5, 1])
You can do vectorized operations on NumPy arrays
Python
vec + 2
vec - 2
vec * 2
vec / 2
2 / vec
Two vectors of the same size can be added/subtracted/multiplied/divided:
Python
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
x + y
x - y
x / y
x * y
x ** y
You extract individual elements just like in R, using brackets
[]
Python
vec[0]
vec[0:2]
Extract arbitrary elements by passing an index array:
Python
ind = np.array([0, 2])
vec[ind]
# or
vec[np.array([0, 2])]
Key Difference: Python starts counting
from 0, not 1. So the first element of a vector is
vec[0]
, not vec[1]
.
Combine two arrays via np.concatenate()
(notice the
use of brackets here)
Python
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
np.concatenate([x, y])
## array([1, 2, 3, 4, 5, 6, 7, 8])
In R, we have functions operate on objects
(e.g. log(x)
, sort(x)
, etc).
Python also has functions that operate on objects. But objects usually have functions associated with them directly. You access these functions by a period after the object name. These functions are called “methods”. Use tab completion to scroll through the available methods of an object.
Python
vec.sort() # sort
vec.min() # minimum
vec.max() # maximum
vec.mean() # mean
vec.sum() # sum
vec.var() # variance
But there are still loads of useful functions that operate on objects.
Python
np.sort(vec)
np.min(vec)
np.max(vec)
np.mean(vec)
np.sum(vec)
np.var(vec)
np.size(vec)
np.exp(vec)
np.log(vec)
Python uses True
and False
. It uses the
same comparison operators as R
Python
vec > 3
vec < 3
vec == 3
vec != 3
vec <= 3
vec >= 3
The logical operators are: Key Difference: “Not” uses a different character.
&
And
|
Or
~
Not
Python
np.array([True, True, False, False]) & np.array([True, False, True, False])
## array([ True, False, False, False])
np.array([True, True, False, False]) | np.array([True, False, True, False])
## array([ True, True, True, False])
~np.array([True, True, False, False])
## array([False, False, True, True])
You subset a vector using Booleans as you would in R
Python
vec[vec <= 3]
## array([2, 3, 1])
When you are dealing with single logicals, instead of arrays of
logicals, use and
, or
, and not
instead
Python
True and False
## False
True or False
## True
not True
## False
Exercise: Consider two vectors \[ y = (1, 7, 1, 2, 8, 2)\\ x = (4, 6, 2, 7, 8, 2) \] Calculate their inner product: \[ y_1x_1 + y_2x_2 + y_3x_3 + y_4x_4 + y_5x_5 + y_6x_6 \] Do this using vectorized operations.
Exercise: Provide two ways of extracting the 2nd and 5th elements of this vector
Python
x = np.array([4, 7, 8, 1, 2])
Exercise: Extract all elements from the previous vector between 5 and 8 (inclusive). Use predicates.