Numpy

Author

David Gerard

Published

June 3, 2025

Learning Objectives

Python Overview

In R I Want In Python I Use
Base R numpy
dplyr/tidyr pandas
ggplot2 matplotlib/seaborn

Numpy Arrays

  • Let’s import the numpy package:

    Python

    import numpy as np
  • In Python, we assign variables with =, not <-

    Python

    x = 10
    x
    10
  • The arithmetic operations (+, -, *, /) are the exact same

    Python

    x * 2
    x + 2
    x / 2
    x - 2
    x ** 2 # square
    x % 2 # remainder
  • Comments also begin with a #:

    Python

    # This is a comment
  • Help files are called the same way

    Python

    help(min)
    ?min
  • Python lists are like R lists, in that they can have the different types. You create Python lists with brackets []

    Python

    x = ["hello", 1, True]
    x
    ['hello', 1, True]
  • Numpy Arrays are the Python equivalent to R vectors (where each element is the same type). You use the array() method of the numpy package to create a numpy array (note that you give it a list as input)

    Python

    vec = np.array([2, 3, 5, 1])
    vec
    array([2, 3, 5, 1])
  • You can do vectorized operations on NumPy arrays

    Python

    vec + 2
    vec - 2
    vec * 2
    vec / 2
    2 / vec
  • Two vectors of the same size can be added/subtracted/multiplied/divided:

    Python

    x = np.array([1, 2, 3, 4])
    y = np.array([5, 6, 7, 8])
    x + y
    x - y
    x / y
    x * y
    x ** y
  • You extract individual elements just like in R, using brackets []

    Python

    vec[0]
    vec[0:2]
  • Extract arbitrary elements by passing an index array:

    Python

    ind = np.array([0, 2])
    vec[ind]
    # or
    vec[np.array([0, 2])]
  • Key Difference: Python starts counting from 0, not 1. So the first element of a vector is vec[0], not vec[1].

  • Combine two arrays via np.concatenate() (notice the use of brackets here)

    Python

    x = np.array([1, 2, 3, 4])
    y = np.array([5, 6, 7, 8])
    np.concatenate([x, y])
    array([1, 2, 3, 4, 5, 6, 7, 8])

Useful functions over vectors

  • In R, we have functions operate on objects (e.g. log(x), sort(x), etc).

  • Python also has functions that operate on objects. But objects usually have functions associated with them directly. You access these functions by a period after the object name. These functions are called “methods”. Use tab completion to scroll through the available methods of an object.

    Python

    vec.sort() # sort
    vec.min() # minimum
    vec.max() # maximum
    vec.mean() # mean
    vec.sum() # sum
    vec.var() # variance
  • But there are still loads of useful functions that operate on objects.

    Python

    np.sort(vec)
    np.min(vec)
    np.max(vec)
    np.mean(vec)
    np.sum(vec)
    np.var(vec)
    np.size(vec)
    np.exp(vec)
    np.log(vec)

Booleans (Python’s logicals)

  • Python uses True and False. It uses the same comparison operators as R

    Python

    vec > 3
    vec < 3
    vec == 3
    vec != 3
    vec <= 3
    vec >= 3
  • The logical operators are: Key Difference: “Not” uses a different character.

    • & And
    • | Or
    • ~ Not

    Python

    np.array([True, True, False, False]) & np.array([True, False, True, False])
    array([ True, False, False, False])
    np.array([True, True, False, False]) | np.array([True, False, True, False])
    array([ True,  True,  True, False])
    ~np.array([True, True, False, False])
    array([False, False,  True,  True])
  • You subset a vector using Booleans as you would in R

    Python

    vec[vec <= 3]
    array([2, 3, 1])
  • When you are dealing with single logicals, instead of arrays of logicals, use and, or, and not instead

    Python

    True and False
    False
    True or False
    True
    not True
    False
  • Exercise: Consider two vectors \[ y = (1, 7, 1, 2, 8, 2)\\ x = (4, 6, 2, 7, 8, 2) \] Calculate their inner product: \[ y_1x_1 + y_2x_2 + y_3x_3 + y_4x_4 + y_5x_5 + y_6x_6 \] Do this using vectorized operations.

  • Exercise: Provide two ways of extracting the 2nd and 5th elements of this vector

    Python

    x = np.array([4, 7, 8, 1, 2])
  • Exercise: Extract all elements from the previous vector between 5 and 8 (inclusive). Use predicates.