```{raw} html
``` # NumPy ```{index} single: Python; NumPy ``` ```{contents} Contents :depth: 2 ``` ```{epigraph} "Let's be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics. Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results that are verifiable by reference to the real world. In science consensus is irrelevant. What is relevant is reproducible results." -- Michael Crichton ``` ## Overview [NumPy](https://en.wikipedia.org/wiki/NumPy) is a first-rate library for numerical programming * Widely used in academia, finance and industry. * Mature, fast, stable and under continuous development. We have already seen some code involving NumPy in the preceding lectures. In this lecture, we will start a more systematic discussion of both * NumPy arrays and * the fundamental array processing operations provided by NumPy. ### References * [The official NumPy documentation](http://docs.scipy.org/doc/numpy/reference/). ## NumPy Arrays ```{index} single: NumPy; Arrays ``` The essential problem that NumPy solves is fast array processing. The most important structure that NumPy defines is an array data type formally called a [numpy.ndarray](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html). NumPy arrays power a large proportion of the scientific Python ecosystem. Let's first import the library. ```{code-block} python3 import numpy as np ``` To create a NumPy array containing only zeros we use [np.zeros](http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros) ```{code-block} python3 a = np.zeros(3) a ``` ```{code-block} python3 type(a) ``` NumPy arrays are somewhat like native Python lists, except that * Data *must be homogeneous* (all elements of the same type). * These types must be one of the [data types](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html) (`dtypes`) provided by NumPy. The most important of these dtypes are: * float64: 64 bit floating-point number * int64: 64 bit integer * bool: 8 bit True or False There are also dtypes to represent complex numbers, unsigned integers, etc. On modern machines, the default dtype for arrays is `float64` ```{code-block} python3 a = np.zeros(3) type(a[0]) ``` If we want to use integers we can specify as follows: ```{code-block} python3 a = np.zeros(3, dtype=int) type(a[0]) ``` ### Shape and Dimension ```{index} single: NumPy; Arrays (Shape and Dimension) ``` Consider the following assignment ```{code-block} python3 z = np.zeros(10) ``` Here `z` is a *flat* array with no dimension --- neither row nor column vector. The dimension is recorded in the `shape` attribute, which is a tuple ```{code-block} python3 z.shape ``` Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma). To give it dimension, we can change the `shape` attribute ```{code-block} python3 z.shape = (10, 1) z ``` ```{code-block} python3 z = np.zeros(4) z.shape = (2, 2) z ``` In the last case, to make the 2 by 2 array, we could also pass a tuple to the `zeros()` function, as in `z = np.zeros((2, 2))`. ### Creating Arrays ```{index} single: NumPy; Arrays (Creating) ``` As we've seen, the `np.zeros` function creates an array of zeros. You can probably guess what `np.ones` creates. Related is `np.empty`, which creates arrays in memory that can later be populated with data ```{code-block} python3 z = np.empty(3) z ``` The numbers you see here are garbage values. (Python allocates 3 contiguous 64 bit pieces of memory, and the existing contents of those memory slots are interpreted as `float64` values) To set up a grid of evenly spaced numbers use `np.linspace` ```{code-block} python3 z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements ``` To create an identity matrix use either `np.identity` or `np.eye` ```{code-block} python3 z = np.identity(2) z ``` In addition, NumPy arrays can be created from Python lists, tuples, etc. using `np.array` ```{code-block} python3 z = np.array([10, 20]) # ndarray from Python list z ``` ```{code-block} python3 type(z) ``` ```{code-block} python3 z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64' z ``` ```{code-block} python3 z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists z ``` See also `np.asarray`, which performs a similar function, but does not make a distinct copy of data already in a NumPy array. ```{code-block} python3 na = np.linspace(10, 20, 2) na is np.asarray(na) # Does not copy NumPy arrays ``` ```{code-block} python3 na is np.array(na) # Does make a new copy --- perhaps unnecessarily ``` To read in the array data from a text file containing numeric data use `np.loadtxt` or `np.genfromtxt`---see [the documentation](http://docs.scipy.org/doc/numpy/reference/routines.io.html) for details. ### Array Indexing ```{index} single: NumPy; Arrays (Indexing) ``` For a flat array, indexing is the same as Python sequences: ```{code-block} python3 z = np.linspace(1, 2, 5) z ``` ```{code-block} python3 z[0] ``` ```{code-block} python3 z[0:2] # Two elements, starting at element 0 ``` ```{code-block} python3 z[-1] ``` For 2D arrays the index syntax is as follows: ```{code-block} python3 z = np.array([[1, 2], [3, 4]]) z ``` ```{code-block} python3 z[0, 0] ``` ```{code-block} python3 z[0, 1] ``` And so on. Note that indices are still zero-based, to maintain compatibility with Python sequences. Columns and rows can be extracted as follows ```{code-block} python3 z[0, :] ``` ```{code-block} python3 z[:, 1] ``` NumPy arrays of integers can also be used to extract elements ```{code-block} python3 z = np.linspace(2, 4, 5) z ``` ```{code-block} python3 indices = np.array((0, 2, 3)) z[indices] ``` Finally, an array of `dtype bool` can be used to extract elements ```{code-block} python3 z ``` ```{code-block} python3 d = np.array([0, 1, 1, 0, 0], dtype=bool) d ``` ```{code-block} python3 z[d] ``` We'll see why this is useful below. An aside: all elements of an array can be set equal to one number using slice notation ```{code-block} python3 z = np.empty(3) z ``` ```{code-block} python3 z[:] = 42 z ``` ### Array Methods ```{index} single: NumPy; Arrays (Methods) ``` Arrays have useful methods, all of which are carefully optimized ```{code-block} python3 a = np.array((4, 3, 2, 1)) a ``` ```{code-block} python3 a.sort() # Sorts a in place a ``` ```{code-block} python3 a.sum() # Sum ``` ```{code-block} python3 a.mean() # Mean ``` ```{code-block} python3 a.max() # Max ``` ```{code-block} python3 a.argmax() # Returns the index of the maximal element ``` ```{code-block} python3 a.cumsum() # Cumulative sum of the elements of a ``` ```{code-block} python3 a.cumprod() # Cumulative product of the elements of a ``` ```{code-block} python3 a.var() # Variance ``` ```{code-block} python3 a.std() # Standard deviation ``` ```{code-block} python3 a.shape = (2, 2) a.T # Equivalent to a.transpose() ``` Another method worth knowing is `searchsorted()`. If `z` is a nondecreasing array, then `z.searchsorted(a)` returns the index of the first element of `z` that is `>= a` ```{code-block} python3 z = np.linspace(2, 4, 5) z ``` ```{code-block} python3 z.searchsorted(2.2) ``` Many of the methods discussed above have equivalent functions in the NumPy namespace ```{code-block} python3 a = np.array((4, 3, 2, 1)) ``` ```{code-block} python3 np.sum(a) ``` ```{code-block} python3 np.mean(a) ``` ## Operations on Arrays ```{index} single: NumPy; Arrays (Operations) ``` ### Arithmetic Operations The operators `+`, `-`, `*`, `/` and `**` all act *elementwise* on arrays ```{code-block} python3 a = np.array([1, 2, 3, 4]) b = np.array([5, 6, 7, 8]) a + b ``` ```{code-block} python3 a * b ``` We can add a scalar to each element as follows ```{code-block} python3 a + 10 ``` Scalar multiplication is similar ```{code-block} python3 a * 10 ``` The two-dimensional arrays follow the same general rules ```{code-block} python3 A = np.ones((2, 2)) B = np.ones((2, 2)) A + B ``` ```{code-block} python3 A + 10 ``` ```{code-block} python3 A * B ``` In particular, `A * B` is *not* the matrix product, it is an element-wise product. ### Matrix Multiplication ```{index} single: NumPy; Matrix Multiplication ``` With Anaconda's scientific Python package based around Python 3.5 and above, one can use the `@` symbol for matrix multiplication, as follows: ```{code-block} python3 A = np.ones((2, 2)) B = np.ones((2, 2)) A @ B ``` (For older versions of Python and NumPy you need to use the [np.dot](http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) function) We can also use `@` to take the inner product of two flat arrays ```{code-block} python3 A = np.array((1, 2)) B = np.array((10, 20)) A @ B ``` In fact, we can use `@` when one element is a Python list or tuple ```{code-block} python3 A = np.array(((1, 2), (3, 4))) A ``` ```{code-block} python3 A @ (0, 1) ``` Since we are post-multiplying, the tuple is treated as a column vector. ### Mutability and Copying Arrays NumPy arrays are mutable data types, like Python lists. In other words, their contents can be altered (mutated) in memory after initialization. We already saw examples above. Here's another example: ```{code-block} python3 a = np.array([42, 44]) a ``` ```{code-block} python3 a[-1] = 0 # Change last element to 0 a ``` Mutability leads to the following behavior (which can be shocking to MATLAB programmers...) ```{code-block} python3 a = np.random.randn(3) a ``` ```{code-block} python3 b = a b[0] = 0.0 a ``` What's happened is that we have changed `a` by changing `b`. The name `b` is bound to `a` and becomes just another reference to the array (the Python assignment model is described in more detail {doc}`later in the course