import numpy as npNumpy
Learning Objectives
- Python basics.
- Numpy arrays.
- Chapter 2 of Python Data Science Handbook.
Python Overview
| In R I Want | In Python I Use |
|---|---|
| Base R | numpy |
| dplyr/tidyr | pandas |
| ggplot2 | matplotlib/seaborn |
Numpy Arrays
Let’s import the numpy package:
Python
In Python, we assign variables with
=, not<-Python
x = 10 x10The arithmetic operations (
+,-,*,/) are the exact samePython
x * 2 x + 2 x / 2 x - 2 x ** 2 # square x % 2 # remainderComments also begin with a
#:Python
# This is a commentHelp files are called the same way
Python
help(min) ?minPython lists are like R lists, in that they can have the different types. You create Python lists with brackets
[]Python
x = ["hello", 1, True] x['hello', 1, True]Numpy Arrays are the Python equivalent to R vectors (where each element is the same type). You use the
array()method of the numpy package to create a numpy array (note that you give it a list as input)Python
vec = np.array([2, 3, 5, 1]) vecarray([2, 3, 5, 1])You can do vectorized operations on NumPy arrays
Python
vec + 2 vec - 2 vec * 2 vec / 2 2 / vecTwo vectors of the same size can be added/subtracted/multiplied/divided:
Python
x = np.array([1, 2, 3, 4]) y = np.array([5, 6, 7, 8]) x + y x - y x / y x * y x ** yYou extract individual elements just like in R, using brackets
[]Python
vec[0] vec[0:2]Extract arbitrary elements by passing an index array:
Python
ind = np.array([0, 2]) vec[ind] # or vec[np.array([0, 2])]Key Difference: Python starts counting from 0, not 1. So the first element of a vector is
vec[0], notvec[1].Combine two arrays via
np.concatenate()(notice the use of brackets here)Python
x = np.array([1, 2, 3, 4]) y = np.array([5, 6, 7, 8]) np.concatenate([x, y])array([1, 2, 3, 4, 5, 6, 7, 8])
Useful functions over vectors
In R, we have functions operate on objects (e.g.
log(x),sort(x), etc).Python also has functions that operate on objects. But objects usually have functions associated with them directly. You access these functions by a period after the object name. These functions are called “methods”. Use tab completion to scroll through the available methods of an object.
Python
vec.sort() # sort vec.min() # minimum vec.max() # maximum vec.mean() # mean vec.sum() # sum vec.var() # varianceBut there are still loads of useful functions that operate on objects.
Python
np.sort(vec) np.min(vec) np.max(vec) np.mean(vec) np.sum(vec) np.var(vec) np.size(vec) np.exp(vec) np.log(vec)
Booleans (Python’s logicals)
Python uses
TrueandFalse. It uses the same comparison operators as RPython
vec > 3 vec < 3 vec == 3 vec != 3 vec <= 3 vec >= 3The logical operators are: Key Difference: “Not” uses a different character.
&And|Or~Not
Python
np.array([True, True, False, False]) & np.array([True, False, True, False])array([ True, False, False, False])np.array([True, True, False, False]) | np.array([True, False, True, False])array([ True, True, True, False])~np.array([True, True, False, False])array([False, False, True, True])You subset a vector using Booleans as you would in R
Python
vec[vec <= 3]array([2, 3, 1])When you are dealing with single logicals, instead of arrays of logicals, use
and,or, andnotinsteadPython
True and FalseFalseTrue or FalseTruenot TrueFalseExercise: Consider two vectors \[ y = (1, 7, 1, 2, 8, 2)\\ x = (4, 6, 2, 7, 8, 2) \] Calculate their inner product: \[ y_1x_1 + y_2x_2 + y_3x_3 + y_4x_4 + y_5x_5 + y_6x_6 \] Do this using vectorized operations.
Exercise: Provide two ways of extracting the 2nd and 5th elements of this vector
Python
x = np.array([4, 7, 8, 1, 2])Exercise: Extract all elements from the previous vector between 5 and 8 (inclusive). Use predicates.