import numpy as np
Numpy
Learning Objectives
- Python basics.
- Numpy arrays.
- Chapter 2 of Python Data Science Handbook.
Python Overview
In R I Want | In Python I Use |
---|---|
Base R | numpy |
dplyr/tidyr | pandas |
ggplot2 | matplotlib/seaborn |
Numpy Arrays
Let’s import the numpy package:
Python
In Python, we assign variables with
=
, not<-
Python
= 10 x x
10
The arithmetic operations (
+
,-
,*
,/
) are the exact samePython
* 2 x + 2 x / 2 x - 2 x ** 2 # square x % 2 # remainder x
Comments also begin with a
#
:Python
# This is a comment
Help files are called the same way
Python
help(min) min ?
Python lists are like R lists, in that they can have the different types. You create Python lists with brackets
[]
Python
= ["hello", 1, True] x x
['hello', 1, True]
Numpy Arrays are the Python equivalent to R vectors (where each element is the same type). You use the
array()
method of the numpy package to create a numpy array (note that you give it a list as input)Python
= np.array([2, 3, 5, 1]) vec vec
array([2, 3, 5, 1])
You can do vectorized operations on NumPy arrays
Python
+ 2 vec - 2 vec * 2 vec / 2 vec 2 / vec
Two vectors of the same size can be added/subtracted/multiplied/divided:
Python
= np.array([1, 2, 3, 4]) x = np.array([5, 6, 7, 8]) y + y x - y x / y x * y x ** y x
You extract individual elements just like in R, using brackets
[]
Python
0] vec[0:2] vec[
Extract arbitrary elements by passing an index array:
Python
= np.array([0, 2]) ind vec[ind]# or 0, 2])] vec[np.array([
Key Difference: Python starts counting from 0, not 1. So the first element of a vector is
vec[0]
, notvec[1]
.Combine two arrays via
np.concatenate()
(notice the use of brackets here)Python
= np.array([1, 2, 3, 4]) x = np.array([5, 6, 7, 8]) y np.concatenate([x, y])
array([1, 2, 3, 4, 5, 6, 7, 8])
Useful functions over vectors
In R, we have functions operate on objects (e.g.
log(x)
,sort(x)
, etc).Python also has functions that operate on objects. But objects usually have functions associated with them directly. You access these functions by a period after the object name. These functions are called “methods”. Use tab completion to scroll through the available methods of an object.
Python
# sort vec.sort() min() # minimum vec.max() # maximum vec.# mean vec.mean() sum() # sum vec.# variance vec.var()
But there are still loads of useful functions that operate on objects.
Python
np.sort(vec)min(vec) np.max(vec) np. np.mean(vec)sum(vec) np. np.var(vec) np.size(vec) np.exp(vec) np.log(vec)
Booleans (Python’s logicals)
Python uses
True
andFalse
. It uses the same comparison operators as RPython
> 3 vec < 3 vec == 3 vec != 3 vec <= 3 vec >= 3 vec
The logical operators are: Key Difference: “Not” uses a different character.
&
And|
Or~
Not
Python
True, True, False, False]) & np.array([True, False, True, False]) np.array([
array([ True, False, False, False])
True, True, False, False]) | np.array([True, False, True, False]) np.array([
array([ True, True, True, False])
~np.array([True, True, False, False])
array([False, False, True, True])
You subset a vector using Booleans as you would in R
Python
<= 3] vec[vec
array([2, 3, 1])
When you are dealing with single logicals, instead of arrays of logicals, use
and
,or
, andnot
insteadPython
True and False
False
True or False
True
not True
False
Exercise: Consider two vectors \[ y = (1, 7, 1, 2, 8, 2)\\ x = (4, 6, 2, 7, 8, 2) \] Calculate their inner product: \[ y_1x_1 + y_2x_2 + y_3x_3 + y_4x_4 + y_5x_5 + y_6x_6 \] Do this using vectorized operations.
Exercise: Provide two ways of extracting the 2nd and 5th elements of this vector
Python
= np.array([4, 7, 8, 1, 2]) x
Exercise: Extract all elements from the previous vector between 5 and 8 (inclusive). Use predicates.