Disclaimer: This Jupyter Notebook contains content generated with the assistance of AI. While every effort has been made to review and validate the outputs, users should independently verify critical information before relying on it. The SELENE notebook repository is constantly evolving. We recommend downloading or pulling the latest version of this notebook from Github.

NumPy — Basic Tutorial¶

This notebook provides a basic tutorial on NumPy, one of the most essential libraries in the Python data science and machine learning ecosystem. NumPy (short for Numerical Python) provides powerful tools for working with numerical data, enabling efficient storage, manipulation, and computation on large arrays and matrices. Its performance and flexibility make it the foundation upon which many higher-level libraries such as Pandas, SciPy, and scikit-learn are built.

At its core, NumPy introduces the ndarray (n-dimensional array), a data structure that is both memory-efficient and fast. Unlike Python's built-in lists, NumPy arrays allow for vectorized operations, meaning that computations can be applied to entire arrays at once without writing explicit loops. This not only results in cleaner and more readable code but also achieves significant performance improvements due to optimized implementations in C and Fortran under the hood.

Learning NumPy is essential for anyone interested in data analysis, scientific computing, or machine learning. It provides the mathematical foundation for many algorithms and models, and understanding how to efficiently manipulate arrays is a skill that translates directly into better performance and cleaner implementations in applied projects. Even when working with more abstract or high-level frameworks, a solid grasp of NumPy concepts allows you to understand what happens behind the scenes.

Moreover, popular deep learning libraries such as PyTorch and TensorFlow adopt much of NumPy's naming conventions and array semantics. Functions like reshape, sum, or transpose behave very similarly in both NumPy and PyTorch, making the transition from NumPy to deep learning frameworks smooth and intuitive. For this reason, learning NumPy is not just about understanding one library but also about building a strong conceptual foundation for the entire Python scientific computing stack.

In this tutorial, we will explore the core features of NumPy, including array creation, indexing, slicing, reshaping, broadcasting, and basic mathematical operations. By the end, you will have a practical understanding of how to work efficiently with numerical data and be well-prepared to apply these concepts to more advanced libraries and real-world problems.

Setting up the Notebook¶

Make Required Imports¶

This notebook requires the import of different Python packages but also additional Python modules that are part of the repository. If a package is missing, use your preferred package manager (e.g., conda or pip) to install it. If the code cell below runs with any errors, all required packages and modules have successfully been imported.

In [1]:
import numpy as np

from sklearn.metrics.pairwise import euclidean_distances
from sklearn.preprocessing import StandardScaler

Set Print Options¶

NumPy allows to print multidimensional arrays, by default with high precision and in scientific notation. To make the output easier to read, we limit the output to 3 decimal places and do not use the scientific notations. Feel free to remove this line and see how the output changes.

In [2]:
np.set_printoptions(precision=3, suppress=True)

Multidimensional Arrays¶

In NumPy, arrays are the core concept and the foundation of everything the library does. A NumPy array — officially called an ndarray (n-dimensional array) — is a powerful and efficient data structure that can represent data in one or more dimensions. For example, a 1D array is like a list of numbers, a 2D array represents a matrix with rows and columns, and higher-dimensional arrays (3D, 4D, etc.) can represent complex data such as images, time series, or tensors. Unlike regular Python lists, NumPy arrays are homogeneous (all elements share the same data type) and stored in contiguous memory blocks, allowing for extremely fast numerical computations.

Multidimensional arrays are so central to NumPy because they enable vectorized operations and allow performing computations on entire datasets at once without explicit Python loops. This leads to cleaner, more concise code and significant performance gains. Arrays also provide the foundation for many advanced operations such as reshaping, broadcasting, and matrix manipulation; which we will cover later. In fact, understanding how NumPy arrays work is essential for mastering not only NumPy itself but also other scientific and machine learning libraries like PyTorch and TensorFlow, which build upon the same array-based concepts.

Basic Usage¶

Creating Arrays¶

NumPy provides several ways to create arrays, but two of the most commonly used functions are array() and asarray(). Both functions can convert existing data such as Python lists, tuples, or nested sequences into NumPy arrays. The main difference lies in how they handle existing NumPy arrays: array() always creates a new copy of the data by default, while asarray() avoids making a copy when the input is already an ndarray. This distinction can be important for performance, especially when working with large datasets, as unnecessary copying can increase memory usage and slow down computations.

In practice, array() is generally used when you want to ensure that your data is fully independent of the original source, such as when you plan to modify the array without affecting the input. On the other hand, asarray() is useful when you want to safely convert inputs to arrays while preserving memory efficiency. For example, when passing data between functions that expect NumPy arrays, np.asarray() ensures the correct type without redundant copying. Understanding the difference between these two functions helps you write more efficient and predictable code when initializing and managing data in NumPy.

Let's first consider the case of creating a NumPy array by transforming a regular Python list (or tuple). In this case, both array() and asarray() behave the same way: they create a new NumPy array that is a copy of the input data. The difference between the two functions only matters when the input is already a NumPy array. Also, When creating a NumPy array from a Python list, the list must be well-structured and consistent. This means that all elements should be of the same data type** (for example, all integers or all floats). Additionally, if the list contains nested lists, each inner list must have the same length, forming a regular grid of rows and columns. If these conditions are not met, NumPy will be unable to interpret the data as a proper multidimensional array and may either create an array of type object (which is inefficient and not truly numerical) or raise an error.

The code cell below shows several examples of a Python list, where the first two definitions can be converted into NumPy arrays, while the other two definitions will either throw an error or result in arrays containing (arguably) not the expected values. You should try each definition of py_list and run the code cell to see how NumPy behaves in each case.

In [3]:
py_list =  [[2, 0, 1], [1, 3, 2]]     ## Correct format: consistent sizes, same datatype (here: int)
#py_list =  [[2, 0, 1], [1, 3, 2.0]]   ## Acceptable format: consistent sizes, all entries treated as floats/double
#py_list =  [[2, 0, 1], [1, 3]]        ## Inconsistent sizes: result is not a 2D array but 1D array with lists as entries
#py_list =  [[2, 0, 1], [1, 3, '2']]   ## Different data types: all entries are treated as string

a = np.array(py_list)

print(a)
[[2 0 1]
 [1 3 2]]

It is also possible to specify the data type of the array elements using the dtype argument in both array() and asarray(). By default, NumPy automatically infers the most suitable type based on the input data (for example, integers, floats, or strings), but you can explicitly set dtype to control how the data is stored and interpreted. For instance, using dtype=float will convert all elements to floating-point numbers, even if the original data contains integers; see the code cell below. This flexibility is especially useful when you need to ensure consistent data types for mathematical operations, improve memory efficiency, or match the requirements of specific algorithms.

In [4]:
a = np.array([[2, 0, 1], [1, 3, 2]], dtype=float)

print(a)
[[2. 0. 1.]
 [1. 3. 2.]]

While 2d array are easier to print and to interpret, NumPy arrays can have multiple dimensions (called axes). The following example creates a 3D array, so the input grid is a list of lists of lists.

In [5]:
a = np.array([[[2, 0, 1], [1, 3, 2]], [[4, 4, 6], [3, 5, 1]]])

print(a)
[[[2 0 1]
  [1 3 2]]

 [[4 4 6]
  [3 5 1]]]

Describing Arrays¶

NumPy provides several convenient attributes to describe the structure and contents of an array: ndim, shape, and dtype. The ndim attribute returns the number of dimensions (axes) in the array (for example, 1 for a vector, 2 for a matrix, or more for higher-dimensional arrays). The shape attribute gives a tuple describing the size of the array along each dimension, such as (3, 4) for an array with 3 rows and 4 columns. Finally, the dtype attribute specifies the data type of the array's elements, such as int32, float64, or bool, indicating how the data is stored in memory. Together, these attributes provide a complete description of a NumPy array's structure, helping you understand its dimensionality, layout, and element type; all of which are crucial for performing correct and efficient numerical computations.

To show some examples, let's first create a simple 3D array a. The various definitions of a below differ with respect to the specified data type. You are encouraged to try each definition of a to see how it affects the outputs of subsequent code cells.

In [6]:
a = np.array([[[2, 0, 1], [1, 3, 2]], [[4, 4, 6], [3, 5, 1]]], dtype=np.int64)
#a = np.array([[[2, 0, 1], [1, 3, 2]], [[4, 4, 6], [3, 5, 1]]], dtype=np.int32)
#a = np.array([[[2, 0, 1], [1, 3, 2]], [[4, 4, 6], [3, 5, 1]]], dtype=np.float32)

Using np.array.ndim we can now ask for the the number of dimensions (axes) of our array a — of course, for this simple example where we created a manually, we already know that the arrays as 3 axes.

In [7]:
print(f"Number of axes: {a.ndim}")
Number of axes: 3

Similarly, we can use np.array.dtype to get the data type of the array. More information about NumPy data types can be found here.

In [8]:
print(f"Data type: {a.dtype}")
Data type: int64

While ndim gives us the number of dimensions/axes, we also often need to know the size of each axis. This information we get from the shape of a NumPy array represented as a tuple of integers. For example, a 1D array with 5 elements has a shape of (5,), while a 2D array with 3 rows and 4 columns has a shape of (3, 4). In terms of code, array.shape returns the code of a NumPy array — as shown below for our array a.

In [9]:
print(a.shape)
(2, 2, 3)

The shape essentially tells you how the data is organized in memory and determines how operations like reshaping, broadcasting, or indexing can be applied. Understanding an array's shape is fundamental when working with NumPy, as many array operations require matching or compatible shapes to function correctly; we will cover concrete example later in the notebook

Side note: When working with higher-dimensional arrays in NumPy, the traditional terms "rows" and "columns" quickly become insufficient and even confusing. These terms are meaningful for 2D data, such as matrices, where we can visualize data arranged in rows and columns. However, once we move beyond two dimensions, there is no intuitive concept of rows or columns anymore. Instead, each additional dimension adds another level of structure to the data that cannot be accurately described using just "rows" and "columns". For this reason, it is more meaningful and precise to use the terms "dimensions" and "axes". This way of describing arrays scales naturally to any number of dimensions and provides a consistent framework for reasoning about multidimensional data operations such as slicing, reshaping, and broadcasting.

Auxiliary Methods for Array Creation¶

In addition to creating arrays from existing data using array() or asarray(), NumPy provides a variety of auxiliary methods that allow you to generate arrays of a specific form quickly and efficiently. These methods are especially useful for initializing arrays when the exact values are not yet important, or when you need arrays with predictable patterns for testing, prototyping, or mathematical operations. Examples include zeros() to create arrays filled with zeros, ones() for arrays filled with ones, arange() for sequences of evenly spaced values, linspace() for linearly spaced values over a range, and eye() for identity matrices. Using these specialized methods not only saves time but also ensures that the resulting arrays have the correct shape, data type, and structure, providing a solid foundation for further computation.

Let's start with np.zeroes() to create an array with all entries being 0; a very common way to initialize an array. The main input parameter is the shape as tuple of the resulting array.

In [10]:
a = np.zeros((3,5))

print(a) 
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

Similarly, ones() creates an array with all entries being 1. The main input parameter is again the shape as tuple of the resulting array.

In [11]:
a = np.ones((3,5))

print(a) 
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Instead of creating an array with all entries being 0 or 1, with full() we can create arrays where all entries have a specified value of our choice. Here, apart from the shape as a tuple of the resulting array, we also need to specify the value to be used as the initial value.

In [12]:
a = np.full((3,5), 5.8)

print(a) 
[[5.8 5.8 5.8 5.8 5.8]
 [5.8 5.8 5.8 5.8 5.8]
 [5.8 5.8 5.8 5.8 5.8]]

Lastly, instead of fixed values, NumPy also provides auxiliary methods to initialize arrays, In the code cell below, random.rand creates an array with all entries being random values between $0$ and $1$. The main input parameter is the shape of the resulting array. There a several related methods to generates random values, e.g., random.randint, random.randn, and many more.

In [13]:
a = np.random.rand(3,5)

print(a)  
[[0.959 0.723 0.926 0.416 0.106]
 [0.064 0.577 0.915 0.6   0.473]
 [0.241 0.434 0.46  0.743 0.107]]

The arange() method in NumPy is used to create arrays with evenly spaced values within a specified range. It works similarly to Python's built-in range() function but returns a NumPy array instead of a list, which allows for efficient numerical computations. The method typically takes three arguments:

  • start (the first value, inclusive)
  • stop (the end value, exclusive)
  • step (the spacing between values, defaulting to 1).

For example, arange(0, 10, 2) creates the array [0, 2, 4, 6, 8]. Because it returns a NumPy array, the resulting values can be directly used in mathematical operations, indexing, and broadcasting, making arange() a convenient tool for generating sequences of numbers for computations or simulations.

In [14]:
a = np.arange(0, 10, 2)

print(a)
[0 2 4 6 8]

The linspace() method in NumPy is used to create arrays of evenly spaced numbers over a specified interval, similar to arange(), but with a key difference: instead of specifying the step size, you specify the number of values you want in the array. For example, linspace(0, 1, 5) generates the array [0., 0.25, 0.5, 0.75, 1.], producing exactly 5 equally spaced values between 0 and 1, inclusive.

In [15]:
a = np.linspace(0, 1, 5)

print(a)
[0.   0.25 0.5  0.75 1.  ]

The main difference from arange() is that arange() requires a fixed step size, which may result in a number of elements that depends on the interval and step, and may sometimes exclude the end point due to floating-point rounding. In contrast, linspace() guarantees a specific number of points, and by default includes the end point, making it particularly useful for creating arrays for plotting, interpolation, or any scenario where a fixed number of samples over a range is desired.

The last auxiliary method we look at is eye() which creates an array representing an identity matrix. The main input parameter is the size of the matrix. Note that an identity matrix is a square 2D matrix, so a single integer value is sufficient.

In [16]:
a = np.eye(3)

print(a)  
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

In short, NumPy provides a variety of auxiliary methods for creating arrays that serve common and practical purposes. These methods allow you to quickly generate arrays with specific patterns, fixed values, or random numbers without manually specifying each element. They are especially useful for initializing arrays for computations, testing, or simulations, ensuring that arrays have the desired shape, size, and data type from the start. By using these methods, you can write cleaner, more efficient code and focus on the operations and algorithms rather than manually constructing arrays.

Views of Arrays¶

Once we have an array, we typically want to manipulate it either by changing its shape or its values — as we will see in subsequent questions. When manipulating arrays, we need to consider the concept of a view. A view of an array is a new array object that shares the same data as the original array but may have a different shape, type, or indexing. This means that a view does not copy the underlying data but provides a different way of looking at (or "viewing") the same data in memory.

As a consequence, when you modify the elements of a view, the changes are reflected in the original array, and vice versa, because both arrays refer to the same data buffer. However, since the view is a separate object, you can modify its metadata (like shape or strides) independently. It is therefore important to keep in mind that some array manipulation methods return a view, while others return a copy, depending on whether the operation can safely share the same underlying data. Methods like reshaping, slicing, or transposing often return views, meaning they create a new array object that references the same data in memory without duplication. In contrast, operations that require rearranging or modifying the actual data layout, such as fancy indexing, concatenation, or copy(), return copies that have their own independent memory. Understanding this distinction is important for writing memory-efficient and predictable code. We will see examples for both cases throughout the notebook.


Array Manipulation¶

NumPy provides a series of useful methods to manipulate arrays, where the manipulation does not involve changing individual elements but rather the shape of the array. Some of the most common array manipulation methods in NumPy are those that allow you to reshape, combine, split, or otherwise reorganize arrays. Key methods include:

  • transpose(): swaps or permutes the axes of an array, commonly used for matrices.

  • reshape(): changes the shape of an array without altering its data, useful for converting between different dimensions.

  • flatten() / ravel(): convert a multidimensional array into a 1D array.

  • concatenate() / stack() / hstack() / vstack: combine multiple arrays along a specified axis.

  • split() / hsplit() / vsplit(): divide arrays into smaller sub-arrays.

  • resize(): changes the size of an array, which may alter its contents if expanded or truncated.

These methods are essential for preparing data, adjusting array layouts for computations, and ensuring arrays are compatible for operations such as broadcasting, mathematical calculations, or machine learning workflows. They form the backbone of array manipulation in NumPy and are widely used in scientific computing and data processing.

Transposing Arrays¶

Transposing a multidimensional array in NumPy means rearranging its axes, effectively changing the order in which data is indexed. For two-dimensional arrays (matrices), this operation swaps rows and columns — for example, an element at position (i, j) becomes (j, i). However, in higher-dimensional arrays, transposition generalizes this idea by permuting the array's axes according to a specified order. NumPy provides the transpose() method (or the shorthand .T for 2D arrays) to perform this transformation efficiently without copying the data in memory.

The example in the code cell below transposed a 2D array (i.e., a matrix) like should be familiar from your Linear Algebra classes. Notice the different ways to express this operation.

In [17]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_transposed = a.T
#a_transposed = a.transpose()
#a_transposed = np.transpose(a)
print(f"Transposed array (shape: {a_transposed.shape}):\n{a_transposed}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Transposed array (shape: (4, 3)):
[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]

This idea generalizes to arrays with more than 2 dimensions/axes. Let's create a 4D array and transpose it; since we do not care about the actual values but only about the shape of the array, we can simple use ones() for this example.

In [18]:
a = np.ones((2,3,4,5))
print(f"Shape of original array:\n{a.shape}\n")

a_transposed = a.transpose()
print(f"Shape of transposed array:\n{a_transposed.shape}")
Shape of original array:
(2, 3, 4, 5)

Shape of transposed array:
(5, 4, 3, 2)

In simple terms, by default, transposing an array means to swap all axis around. However, we can also explicitly specify which axes we want to swap by providing a tuple containing the order of the axes. To give an example, in the code cell below, we again first create a 4D array (containing only $1$s) but then only swapping the first axis (Index 0) and the last axis (Index 3).

In [19]:
a = np.ones((2,3,4,5))
print(f"Shape of original array:\n{a.shape}\n")

a_transposed = a.transpose((3,1,2,0))
print(f"Shape of transposed array:\n{a_transposed.shape}")
Shape of original array:
(2, 3, 4, 5)

Shape of transposed array:
(5, 3, 4, 2)

Transposing multidimensional arrays is important because it allows you to change the orientation of data, which is often required to perform mathematical operations or align arrays with different shapes. Many algorithms, especially in linear algebra, data processing, and machine learning, assume that data follows a specific axis order (for example, treating rows as samples and columns as features). Transposing helps ensure the data fits these expected formats without duplicating or reshaping the underlying data in memory, making it both efficient and flexible.

In practice, common use cases include preparing data for matrix multiplication (where dimensions must align), converting between row-major and column-major formats, manipulating images or tensors in deep learning (e.g., changing from channel-first to channel-last layouts), and performing operations like dot products or broadcasting that require specific dimension orders. It is also essential in scientific computing when handling multidimensional datasets such as time-series, spatial grids, or simulation data, where axis alignment determines how computations are applied.

Reshaping Arrays¶

Reshaping multidimensional arrays in NumPy means changing the way data is organized into dimensions (or axes) without altering the underlying elements. In other words, the total number of elements in the array remains the same, but their arrangement into rows, columns, or higher dimensions changes. This is done using the reshape() method, which allows you to specify a new shape that is compatible with the original number of elements. For example, an array with shape (4, 3) can be reshaped into (3, 4) or (3, 2, 2), (6, 2), etc. since all those configurations still contain $12$ elements — in other words, the product of all axis sizes results in $12$.

To illustrate this, let's first create a (4, 3) array and reshape it to shape (3, 4); see below.

In [20]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_reshaped = a.reshape(4,3)
#a_reshaped = np.reshape(a, (4,3))
print(f"Reshaped array (shape: {a_reshaped.shape}):\n{a_reshaped}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Reshaped array (shape: (4, 3)):
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

Important: Notice that the shape of the reshaped tensor is the same as after using transpose() on the same input array. However, both resulting arrays are different with respect to the elements. This means that this are not equivalent operations, and you should never use np.reshape() to swap the axes of an array.

The reshape() method returns a view of the original array whenever possible, meaning it does not copy the underlying data if the new shape is compatible with the array's memory layout. However, if a view cannot be created (for example, when the desired shape would require elements to be stored in a different memory order) then the method returns a copy instead.

As mentioned above, as long as the products of the axes match, the number of dimensions for the new shape do not matter. All the examples in the code below are valid ways to reshape our (3, 4) array as multiplying all new axes results in $12$. Of course, in practice, reshaping should result in meaningful new arrays.

In [21]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

# All the commands below will work just fine.
a_reshaped = a.reshape(6,2)
#a_reshaped = a.reshape(2,6)
#a_reshaped = a.reshape(3,2,2)
#a_reshaped = a.reshape(1,1,6,1,1,2,1,1)
#a_reshaped = a.reshape(1,1,6,1,1,2,1,1)
#a_reshaped = a.reshape(2,1,3,2,1)
#a_reshaped = a.reshape(1,12)
#a_reshaped = a.reshape(1,1,12)
#a_reshaped = a.reshape(1,1,1,12)
#a_reshaped = a.reshape(1,1,1,1,12)
#...

print(f"Reshaped array (shape: {a_reshaped.shape}):\n{a_reshaped}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Reshaped array (shape: (6, 2)):
[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]]

Just to give an example, the following reshaping fails since the product of the dimensions of the new shape is not 12.

In [22]:
#a.reshape(2, 1, 5)  # <-- This will fail (product is 10)
#a.reshape(8, 2)  # <-- That will fail (prodcut is 16)

In practice the size of an array may not be known ahead of time, making it difficult to guarantee a new shape will be valid. reshape() therefore accepts a special dimension size -1 to tell the method to calculate the required size of the dimensions automatically. A very common use case is to flatten a multidimensional array into a 1D array. Here, we do not really care about the number and size of all dimensions. We can do this for our example array a as follows:

In [23]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_reshaped = a.reshape(-1)
print(f"Reshaped array (shape: {a_reshaped.shape}):\n{a_reshaped}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Reshaped array (shape: (12,)):
[ 1  2  3  4  5  6  7  8  9 10 11 12]

The -1 parameter can also be used in different parts of the new shape. For example, let's assume we need to reshape our (3, 4) array into another 3D array with the first and last axis being of size $2$. This means we can use -1 to automatically calculate the size of the remaining dimensions; which here is $3$ since $2\times 3\times 2 = 12$.

In [24]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_reshaped = a.reshape(2,-1,2)
print(f"Reshaped array (shape: {a_reshaped.shape}):\n{a_reshaped}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Reshaped array (shape: (2, 3, 2)):
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]

It is easy to see that -1 can only be used once, otherwise the calculation of the missing dimensions would generally be ambiguous. Also, it must be possible to find the missing dimension to ensure that the products of dimensions will match. As such, the following two examples will fail:

In [25]:
#a.reshape(-1, 3, -1)   # ValueError: can only specify one unknown dimension
#a.reshape(-1, 3, 3)    # ValueError: cannot reshape array of size 12 into shape (3,3) -- 3*3*?=12 does not work out

Reshaping multidimensional arrays is important because it allows you to reorganize data into the structure required for specific computations, algorithms, or machine learning models; all without changing the actual data values. Many operations in scientific computing, linear algebra, or deep learning expect data in a particular shape (for example, 2D matrices for matrix multiplication or 4D tensors for image batches). Reshaping makes it easy to convert between these representations efficiently, helping ensure compatibility between different processing steps while maintaining performance.

In practice, common use cases include preparing input data for machine learning models (e.g., reshaping flat vectors into 2D or 4D tensors), converting between batches and single samples, flattening arrays before feeding them into fully connected neural network layers, or organizing data for broadcasting operations. It is also widely used in image and signal processing, where reshaping helps handle multidimensional data such as color channels, time steps, or spatial dimensions in a structured and meaningful way.

Apart from the general-purpose reshape() method, NumPy has also the two single-purpose methods flatten() and ravel() to convert a multidimensional array into a 1D array. We already saw that we can achieve this with reshape(-1); the code cell below gives as the same result using flatten().

In [26]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_flattened = a.flatten()
print(f"Flattend array (shape: {a_flattened.shape}):\n{a_flattened}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Flattend array (shape: (12,)):
[ 1  2  3  4  5  6  7  8  9 10 11 12]

Alternatively, we can use np.ravel() as shown below.

In [27]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_raveled = a.ravel()
print(f"Raveled array (shape: {a_raveled.shape}):\n{a_raveled}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Raveled array (shape: (12,)):
[ 1  2  3  4  5  6  7  8  9 10 11 12]

NumPy provides both flatten() and ravel() because they serve slightly different purposes regarding memory handling and performance: flatten() always returns a copy of the original array's data, and ravel() returns a view of the array whenever possible. In short, NumPy includes both because flatten() prioritizes safety and data integrity, while ravel() prioritizes efficiency and low memory use.

Combining Arrays¶

Combining arrays in NumPy using functions like concatenate() or stack() means joining multiple arrays together into a single, larger array along a specified axis. The concatenate() function joins existing arrays along an existing dimension, while stack() creates a new dimension in the result, effectively adding another level of depth to the data structure. Both methods are used to organize and unify data stored in separate arrays for more convenient computation and analysis. The main purpose of using these functions is to efficiently assemble data without copying or manually iterating over elements. This is especially useful in scientific computing and machine learning, where data often comes in parts (such as batches, features, or samples) that need to be merged for processing.

Let's first consider the concatenate() method. It provides a flexible way to combine arrays that share the same shape in all dimensions except the one along which concatenation takes place — specified by the axis parameter. For example, assuming a 2D array (i.e., a matrix), if you concatenate along axis=0 (rows), all arrays must have the same number of columns. Also, the arrays must be compatible data types (e.g., all numeric, or all strings) so that NumPy can store them in a single resulting array. The code cell below shows an example. Since both 2D arrays a and b have the same shape, we can concatenate them along either of the two axes.

In [28]:
a = np.array([[1,1,1,1], [2,2,2,2], [3,3,3,3]])
b = np.array([[4,4,4,4], [5,5,5,5], [6,6,6,6]])
print(f"Original array a (shape: {a.shape}):\n{a}\n")
print(f"Original array b (shape: {a.shape}):\n{b}\n")

ab_concat = np.concatenate((a,b), axis=0)
print(f"Concatenated array along 1st axis (shape: {ab_concat.shape}):\n{ab_concat}\n")

ab_concat = np.concatenate((a,b), axis=1)
print(f"Concatenated array along 2nd axis (shape: {ab_concat.shape}):\n{ab_concat}\n")
Original array a (shape: (3, 4)):
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]

Original array b (shape: (3, 4)):
[[4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]]

Concatenated array along 1st axis (shape: (6, 4)):
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]
 [4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]]

Concatenated array along 2nd axis (shape: (3, 8)):
[[1 1 1 1 4 4 4 4]
 [2 2 2 2 5 5 5 5]
 [3 3 3 3 6 6 6 6]]

The stack() differs from concatenate() in that it joins arrays by adding a new dimension, while concatenate() joins arrays along an existing dimension. In other words, stack() increases the dimensionality of the output array by one, whereas concatenate() preserves the original number of dimensions. For instance, stacking two 1D arrays creates a 2D array, while concatenating them keeps the result 1D. Additionally, stack() requires that all input arrays have exactly the same shape, since it cannot merge arrays of different sizes along any axis. Let's see how this works using the same two 2D arrays a and b as before.

In [29]:
a = np.array([[1,1,1,1], [2,2,2,2], [3,3,3,3]])
b = np.array([[4,4,4,4], [5,5,5,5], [6,6,6,6]])
print(f"Original array a (shape: {a.shape}):\n{a}\n")
print(f"Original array b (shape: {a.shape}):\n{b}\n")

ab_stacked = np.stack((a,b), axis=0)
print(f"Stacked array along 1st axis (shape: {ab_stacked.shape}):\n{ab_stacked}\n")

ab_stacked = np.stack((a,b), axis=1)
print(f"Stacked array along 2nd axis (shape: {ab_stacked.shape}):\n{ab_stacked}\n")
Original array a (shape: (3, 4)):
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]

Original array b (shape: (3, 4)):
[[4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]]

Stacked array along 1st axis (shape: (2, 3, 4)):
[[[1 1 1 1]
  [2 2 2 2]
  [3 3 3 3]]

 [[4 4 4 4]
  [5 5 5 5]
  [6 6 6 6]]]

Stacked array along 2nd axis (shape: (3, 2, 4)):
[[[1 1 1 1]
  [4 4 4 4]]

 [[2 2 2 2]
  [5 5 5 5]]

 [[3 3 3 3]
  [6 6 6 6]]]

Notice that the stacked arrays are now 3D arrays. The stack() method is commonly needed when you want to combine multiple arrays into a single higher-dimensional array &mdashl that is, when each input array should become a distinct "slice" or layer along a new axis. This is especially useful when you have multiple arrays of identical shape that represent related data samples, channels, or time steps, and you want to treat them as a single structured dataset. Typical use cases include:

  • Creating batches of data for machine learning, where each input array represents one sample and you want a 3D array with shape (batch_size, height, width) or (batch_size, features).
  • Combining color channels in image processing (e.g., stacking red, green, and blue arrays to form an RGB image).
  • Building multi-dimensional datasets such as time series or volumetric data, where each array corresponds to a time frame or depth slice.

In short, stack() is used whenever you need to group multiple arrays of the same shape into a single, higher-dimensional structure for easier collective processing. Beyond stack(), NumPy also provides several additional methods to make common stacking tasks more convenient

The methods vstack(), hstack(), and dstack() are specialized forms of stack() that simplify stacking arrays along specific, commonly used axes. Despite their names, this makes their behavior more equivalent to concatenate() but with a preset axis value, designed to make code more readable and intuitive for particular stacking directions.

  • vstack() (vertical stack): Stacks arrays vertically, meaning along axis=0 for 1D arrays and along the first axis (rows) for higher-dimensional arrays. It is equivalent to stack(arrays, axis=0) when working with 1D data but behaves like concatenation along rows for 2D arrays.
  • hstack() (horizontal stack): Stacks arrays horizontally, meaning along axis=1 for 2D arrays (columns). For 1D arrays, it is equivalent to concatenation since they only have one axis.
  • dstack() (depth stack): Stacks arrays along the third dimension, i.e., axis=2, creating a depth-wise combination. It is commonly used for stacking 2D arrays (like image channels) into a 3D array.

The code cell below shows an example for using vstack(), hstack(), and dstack().

In [30]:
a = np.array([[1,1,1,1], [2,2,2,2], [3,3,3,3]])
b = np.array([[4,4,4,4], [5,5,5,5], [6,6,6,6]])
print(f"Original array a (shape: {a.shape}):\n{a}\n")
print(f"Original array b (shape: {a.shape}):\n{b}\n")

ab_vstacked = np.vstack((a,b))
print(f"Stacked array along 1st axis (shape: {ab_vstacked.shape}):\n{ab_vstacked}\n")

ab_hstacked = np.hstack((a,b))
print(f"Stacked array along 2nd axis (shape: {ab_hstacked.shape}):\n{ab_hstacked}\n")

ab_dstacked = np.dstack((a,b))
print(f"Stacked array along 2nd axis (shape: {ab_dstacked.shape}):\n{ab_dstacked}\n")
Original array a (shape: (3, 4)):
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]

Original array b (shape: (3, 4)):
[[4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]]

Stacked array along 1st axis (shape: (6, 4)):
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]
 [4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]]

Stacked array along 2nd axis (shape: (3, 8)):
[[1 1 1 1 4 4 4 4]
 [2 2 2 2 5 5 5 5]
 [3 3 3 3 6 6 6 6]]

Stacked array along 2nd axis (shape: (3, 4, 2)):
[[[1 4]
  [1 4]
  [1 4]
  [1 4]]

 [[2 5]
  [2 5]
  [2 5]
  [2 5]]

 [[3 6]
  [3 6]
  [3 6]
  [3 6]]]

Notice how the results for vstack() and hstack() matches the ones for concatenate() using axis=0 and axis=1 since we stack along an existing axis in both cases. However, since a and b are two 2D arrays, dstack() stacks both 2D arrays along a new 3rd axis.

Splitting Arrays¶

Instead of combining multiple arrays into one, we can also split an array into multiple smaller arrays. The general purpose of splitting multidimensional arrays is to divide large datasets into smaller, manageable sub-arrays for easier processing, analysis, or computation. By breaking an array into parts, you can apply different operations to each segment, parallelize computations, or selectively manipulate sections of the data without affecting the whole array. Splitting is particularly useful when working with high-dimensional data, such as images, time series, or multi-channel signals, where processing the entire dataset at once may be inefficient or unnecessary.

Common use cases include batch processing in machine learning, where a large dataset is split into smaller batches for training; image processing, where an image array may be divided into patches for filtering or feature extraction; and data analysis, where a 2D table of data can be split into separate columns or row groups for statistical computations. Splitting is also useful for parallel computation, enabling different parts of an array to be processed concurrently across multiple cores or machines. In essence, splitting arrays allows more flexible, memory-efficient, and modular handling of multidimensional data.

Let's first consider the split() method, which is used to divide an array into multiple sub-arrays along a specified axis. It allows you to partition an array either into equal-sized sections or at specific index positions, depending on the argument provided. The basic form is split(array, indices_or_sections, axis=0), where indices_or_sections can be an integer (for equal splits) or a list of indices (to split at certain points). The axis parameter determines the direction of the split — rows (axis=0), columns (axis=1), or higher dimensions. In the code cell below we split our previous (3, 4) array row-wise (axis=1) into 3 equal parts. Naturally, since we only have 3 rows, the result subarrays will have only 1 row each.

In [31]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

for idx, subarray in enumerate(np.split(a, 3, axis=0)):
    print(f"Subarray {idx} (shape: {subarray.shape}):\n{subarray}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray 0 (shape: (1, 4)):
[[1 2 3 4]]

Subarray 1 (shape: (1, 4)):
[[5 6 7 8]]

Subarray 2 (shape: (1, 4)):
[[ 9 10 11 12]]

Using a value of 3 works here since the number of rows (i.e., the size of the first axis) is divisible by $3$. Instead of splitting row-wise, we can also split the same array column-wise. Let's first do this such that all resulting subarrays have only a single column. For this we have to use the value 4 and axis=1.

In [32]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

for idx, subarray in enumerate(np.split(a, 4, axis=1)):
    print(f"Subarray {idx} (shape: {subarray.shape}):\n{subarray}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray 0 (shape: (3, 1)):
[[1]
 [5]
 [9]]

Subarray 1 (shape: (3, 1)):
[[ 2]
 [ 6]
 [10]]

Subarray 2 (shape: (3, 1)):
[[ 3]
 [ 7]
 [11]]

Subarray 3 (shape: (3, 1)):
[[ 4]
 [ 8]
 [12]]

Since the original array a has four columns (i.e., the size of the 2nd axis is $s$) we can also split it so that we get two subarrays with $2$ columns each. For this, we only need to change the parameter values from 4 to 2 — which, again, works since $4$ is divisible by $3$ and therefore ensures subarrays of equal size.

In [33]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

for idx, subarray in enumerate(np.split(a, 2, axis=1)):
    print(f"Subarray {idx} (shape: {subarray.shape}):\n{subarray}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray 0 (shape: (3, 2)):
[[ 1  2]
 [ 5  6]
 [ 9 10]]

Subarray 1 (shape: (3, 2)):
[[ 3  4]
 [ 7  8]
 [11 12]]

When we want to split an array into subarrays that are not of equal size, we need to specify an 1D array of sorted integers, where the entries indicate where along the axis the array is split. Of course, these entries must be valid indices on the array along the specified axis. For example, in the code cell below we split our array a along axis=1 (columns) at index 1 and 3. As a result, we get $3$ subarrays with $1$, $2$, and $1$ columns.

In [34]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

for idx, subarray in enumerate(np.split(a, [1,3], axis=1)):
    print(f"Subarray {idx} (shape: {subarray.shape}):\n{subarray}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray 0 (shape: (3, 1)):
[[1]
 [5]
 [9]]

Subarray 1 (shape: (3, 2)):
[[ 2  3]
 [ 6  7]
 [10 11]]

Subarray 2 (shape: (3, 1)):
[[ 4]
 [ 8]
 [12]]

Recall that split() and using a integer value $k$ to specify the number of subarrays will throw an error if the size of the specified axis is not divisible by $k$. However, in practice, we often want to relax this constraint and accept that the last subarray might be smaller in case the size of the axis is not divisible by $k$. In this case, we can use array_split(). This allows us, for example, to split array a row-wise into $2$ subarrays even though it has only $3$ rows.

In [35]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

for idx, subarray in enumerate(np.array_split(a, 2, axis=0)):
    print(f"Subarray {idx} (shape: {subarray.shape}):\n{subarray}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray 0 (shape: (2, 4)):
[[1 2 3 4]
 [5 6 7 8]]

Subarray 1 (shape: (1, 4)):
[[ 9 10 11 12]]

Analogous to stack(), NumPy also provides the axis-specific splitting methods vsplit(), hsplit(), and dsplit() for convenience. For example, instead of writing split(a, 2, axis=1) we can also use hsplit(a, 2) to get the same result; just run the code cell above and compare the output with the corresponding result above.

In [36]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

for idx, subarray in enumerate(np.hsplit(a, 2)):
    print(f"Subarray {idx} (shape: {subarray.shape}):\n{subarray}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray 0 (shape: (3, 2)):
[[ 1  2]
 [ 5  6]
 [ 9 10]]

Subarray 1 (shape: (3, 2)):
[[ 3  4]
 [ 7  8]
 [11 12]]

Side note: All split methods ultimately use array slicing under the hood to create the resulting subarrays. When you split an array, NumPy does not copy the data for each piece; instead, it typically creates views that reference portions of the original array's memory using slice indices. This slicing-based approach makes splitting operations very efficient, since no new data is allocated. However, if the input array is not contiguous in memory or requires reshaping for the split, NumPy may create copies instead of views. In general, though, NumPy's split methods rely on index-based slicing as their core mechanism for partitioning arrays. We cover slicing in more detail when we talk about array indexing in a later section.

Resizing Arrays¶

The resize() method is used to change the shape and size of an existing array in place. However, unlike the reshape() method, which returns a new array with the desired shape (leaving the original unchanged), resize() directly modifies the original array and can alter its total number of elements. The method is especially useful when you need a quick way to ensure consistent array sizes without creating copies, though it should be used carefully since it modifies data in place and may lead to unintended repetition or truncation.

The slightly more intuitive behavior of resize() is when the the size and shape of the new array is smaller than the one of the original array. In this case, the method simply truncates data from the original array to fit the new shape and size. The code cell below shows an example for this, where a (4, 3) array is resized to be of shape (2, 2), which of course can no longer hold all the entries of the original array.

In [37]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_resized = np.resize(a, (2,2))
print(f"Resized array (shape: {a_resized.shape}):\n{a_resized}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Resized array (shape: (2, 2)):
[[1 2]
 [3 4]]

Important: Notice how the truncation of the data is not along the two axes at the same time as you might expect. In other words, resize() does not truncate the by simply "cutting away" the entries that are not included in the new shape, and thus resulting on the following array:

[[1 2]
 [5 6]]

In contrast, for multidimensional arrays, the method in NumPy still fills data in row-major (C-style) order. In simple terms, this means that you can assume that the original arrays first gets flattened into 1D array, which for our example array a would look as follows:

[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]

Then, the resize() method truncates this flattened array to the maximum number of entries the new array can hold. Since we resize a to an (2, 2) array, it can only hold 4 values, and which will be the first 4 values of the flattened version of a. In the last step, we only need to reshape the 1D array containing the 4 remaining entries into the (2, 2) output, giving us the output from the previous code cell.

resize() also allows us to resize an array to any larger shape and size (in terms of the number of entries). If the new size is larger than the original, the method automatically repeats the data to fill the new array. The code cell below shows how this works.

In [38]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_resized = np.resize(a, (4,5))
print(f"Resized array (shape: {a_resized.shape}):\n{a_resized}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Resized array (shape: (4, 5)):
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12  1  2  3]
 [ 4  5  6  7  8]]

While maybe not as intuitive, this repetition of entries is consistent with the behavior when truncating entries. Again, first assume a gets flattened into a 1D array containing all $4\times 3 = 12$ entries (see above). However, now the size of the output arrays is larger as we specify a shape of (4, 5) which holds $20$ entries. Therefore, the method simply starts repeating entries from the beginning, giving us the following new 1D array:

[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8]

Reshaping this new 1D to (4, 5) will then give as the final output of the resize() method.

Note that in both previous examples, we kept the number of axes the same — the original array a and both resized versions were 2D arrays. However, this is not a mandatory requirement and basically any output shape and size is possible. Based on the idea of first flatting the input array to an 1D array, truncating or repeating entries, and then reshaping the 1D array to the specified target shape, you should easily see how this works in the general case. Still let's consider one more example:

In [39]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Original array (shape: {a.shape}):\n{a}\n")

a_resized = np.resize(a, (4,2,5))
print(f"Resized array (shape: {a_resized.shape}):\n{a_resized}\n")
Original array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Resized array (shape: (4, 2, 5)):
[[[ 1  2  3  4  5]
  [ 6  7  8  9 10]]

 [[11 12  1  2  3]
  [ 4  5  6  7  8]]

 [[ 9 10 11 12  1]
  [ 2  3  4  5  6]]

 [[ 7  8  9 10 11]
  [12  1  2  3  4]]]

Just looking at these examples, at should be obvious that the resize() method should be used with care because it modifies the original array in place and can change its total number of elements, potentially leading to data loss or unintended duplication. If the new size is smaller, excess elements are permanently discarded, and if it is larger, the method repeats existing data to fill the new shape, which can produce misleading or meaningless values. Additionally, if the array is a view of another array (i.e., it shares memory), the method will raise an error since changing its size would break that shared memory structure. For these reasons, resize() is powerful but potentially destructive, so best use it only when you are sure that altering the original data and its shape will not cause inconsistencies.


Array Indexing¶

Array indexing refers to the process of accessing or modifying specific elements, axes, or subarrays within a larger array using indices. It allows users to retrieve or manipulate data efficiently without the need for explicit loops. The main purpose of array indexing is to enable efficient data manipulation and analysis by providing direct, fine-grained control over array elements. It is essential for tasks like filtering, reshaping, or updating values in large datasets. Array indexing in NumPy can be done using different methods:

  • Integer indexing: Integer indexing allows you to access individual elements or subsets of an array by specifying the indices explicitly. You can use a single integer to access a specific element or use an array of integers to access multiple elements simultaneously.

  • Slicing: Slicing is a common indexing technique that allows you to extract a contiguous portion of an array. It involves specifying a range of indices using the colon (:) symbol. Slicing can be used to retrieve rows, columns, or sub-arrays from a larger array.

  • Boolean indexing: Boolean indexing enables you to select elements from an array based on a given condition or a boolean mask. You can create a boolean mask by applying a logical condition to the array, and then use the mask to extract elements that satisfy the condition.

  • Fancy indexing: Fancy indexing refers to indexing an array using an array of indices or a boolean mask. It allows for advanced indexing operations, such as selecting specific rows or columns in a specific order or combining multiple indexing techniques.

Array indexing in NumPy is highly flexible and allows for both read and write operations. You can access and modify individual elements, extract subsets, or assign new values to specific locations within an array. This versatility makes array indexing a powerful tool for data manipulation and analysis.

Integer Indexing¶

The most basic way to access array elements is by using scalar value to express the array indices. This is similar to indexing basic lists of lists in Python but also provides a simplified and more convenient notation. In the example below, we extract the value located at the 3rd entry (Index 2) with respect to the first axis and the 2nd entry (Index 1) with respect to the second axis.

In [40]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

#value = a[2][1]  # Notation for basic Python lists of lists; valid notation for NumPy arrays as well but less common
value = a[2,1]  # More common and more convenient notation for NumPy arrays
#value = a[3,1]. # Will throw an out of bound error

print(f"Value: {value}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Value: 10

Note that we do not necessarily have to specify a value for all available axes. For example, if we specify only an integer index for our 2D array, we get the complete entry (i.e., all values with respect to the remaining entries), which in the simple example below using our common 2D array a is simply the first row.

In [41]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

print(f"Values: {a[0]}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Values: [1 2 3 4]

If we want to get the first column instead of the first row of a, we first have to understand slicing.

Slicing¶

Slicing in NumPy is a technique used to access a contiguous subset of elements from an array by specifying index ranges along one or more dimensions. It uses the familiar Python slice notation start:stop:step, allowing you to extract axes or subarrays efficiently without copying data. Because slicing typically returns a view rather than a copy, it provides a lightweight way to work with portions of large datasets while maintaining memory efficiency. The main purpose of slicing in multidimensional arrays is to simplify data selection and manipulation. It enables you to focus on specific regions of an array, such as subsets of images, time windows in signals, or particular features in datasets, without looping through elements manually.

Each slice indexes the elements with respect to only one axis (dimensions). So an $n$-dimensional array may require the definition of $n$ slices. For example, if we want to get the first $2$ rows from our 2D array a, we can do this in two different ways:

In [42]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[0:2]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")

subarray = a[0:2,:]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2, 4)):
[[1 2 3 4]
 [5 6 7 8]]

Subarray (shape: (2, 4)):
[[1 2 3 4]
 [5 6 7 8]]

Since we are only interested in slicing along the first axis, we can actually omit the slice for the second axis. However, we can also add the slice for the second axis which is just : (or ::) since we want all entries with respect to the second axis. In contrast, if we want, say, get the first $2$ columns (Axis 2), we have to specify the slices for the Axis 1 to avoid any ambiguity.

In [43]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[:,0:2]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (3, 2)):
[[ 1  2]
 [ 5  6]
 [ 9 10]]

Of course, we can use restrictive slices regarding all dimensions. For example, in the code cell below, we extract all values that refer to the first $2$ entries with respect to both axes.

In [44]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[0:2,0:2]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2, 2)):
[[1 2]
 [5 6]]

As it is already common for standard Python lists, the indexing can also be done with respect to the end of arrays.

In [45]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[-2:,-2:]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2, 2)):
[[ 7  8]
 [11 12]]

Slicing is a very powerful and flexible way to index multidimensional arrays. These different ways of slicing can be arbitrarily combined to extract the relevant elements from an array. To show a last example, the code cell below uses slicing to access every other column of the first $2$ rows of our 2D array a.

In [46]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[0:2,::2]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2, 2)):
[[1 3]
 [5 7]]

Since slicing uses only start, end and step for indexing, the result will always be a subarray of the original array. That means, for example, it is not possible to return an array with an arbitrary order of the elements for each dimension. In more details, slicing can only extract regular, evenly spaced regions of an array, defined by continuous ranges and constant step sizes, so it cannot directly select arbitrary or non-contiguous elements. In such cases, advanced indexing methods like integer array indexing or boolean indexing are required (see below).

Another limitation is that slicing typically returns a view rather than a copy, meaning changes to the sliced array will modify the original data. While this improves memory efficiency, it can lead to unintended side effects if you modify slices unintentionally. Additionally, slices cannot change the shape or structure of the array beyond simple subsetting, and using negative steps or complex slice combinations may sometimes produce confusing results if not handled carefully.

Integer Array Indexing¶

Integer array indexing in NumPy is an advanced form of indexing that allows you to select arbitrary elements from an array using arrays of integer indices. Unlike slicing, which only extracts contiguous regions, integer array indexing can pick elements from any positions in any order. The purpose of integer array indexing is to enable non-contiguous, selective, and repeatable access to array elements, which is especially useful in data manipulation, sampling, or reshaping tasks. Because NumPy evaluates integer indexing efficiently, it allows complex element selection without the need for explicit loops, making it both concise and performant.

To give a simple example, the code cell below shows how to extract the last and the first row from our input array a. Notice how array indexing now allows to change the order of the rows (compared to just slicing).

In [47]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[[-1,0]]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2, 4)):
[[ 9 10 11 12]
 [ 1  2  3  4]]

To get the last and the first columns (Axis 1) instead of the rows (Axis 0) from a, we can combine integer array indexing with slicing to specify that we want all entries with respect to the first axis.

In [48]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[:,[-1,0]]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (3, 2)):
[[ 4  1]
 [ 8  5]
 [12  9]]

Of course, we can now also use integer arrays for more than just one dimension. In the example below, we combine the two previous examples by using the index array [-1,0] for both rows and columns (i.e., Axis 0 and Axis 1). Before looking at the output, first think about what result you would expect from this operation.

In [49]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[[-1,0], [-1,0]]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2,)):
[12  1]

This result is arguably not that intuitive. However, it becomes more clearer when the integer array indexing is rewritten as a multiple basic integer indexes — which is what is going on under the hood when using multiple index arrays for different axes. In short, When multiple integer index arrays are used for different axes, the arrays are combined element-wise to select elements from the array. This means that NumPy pairs elements from each index array based on their position in the arrays: the first elements of each index array select one element, the second elements select another, and so on. The code cell below illustrates this.

In [50]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = np.array([a[-1,-1], a[0,0]])
print(f"Subarray (shape: {subarray.shape}):\n{subarray}\n")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (2,)):
[12  1]

Of course, this implies that each index array must have the same shape (or it will be broadcastable to a common shape — we talk about broadcasting further below) and the resulting selection corresponds to the combination of all positions according to that shape. For example, the following statement will throw an error since no element-wise combination is possible:

In [51]:
#subarray = a[[2,1], [0,3,1]]   # np.array([a[2,0], a[1,3], a[?,1]])

Boolean Indexing¶

Boolean indexing in an advanced indexing technique that uses boolean arrays (arrays of True and False values) to select elements from another array. Each boolean value corresponds to whether the element at the same position should be included in the result, where True means "select this element", and False means "skip it". The main purpose of boolean indexing is to filter or extract elements that satisfy specific conditions without writing explicit loops.

A boolean index is an array of the same shape — or be broadcastable to it (discussed later) — as the input array with all elements being True (element fulfills condition) or False (element does not fulfill condition). For example, given our 2D array a and a condition a > 5, the resulting boolean index will look as follows:

In [52]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

boolean_index = (a > 5)

print(f"Boolean index (shape: {boolean_index.shape}):\n{boolean_index}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Boolean index (shape: (3, 4)):
[[False False False False]
 [False  True  True  True]
 [ True  True  True  True]]

Now this boolean index can be used to get all the elements from the array a where the respective element in the boolean index is True:

In [53]:
subarray = a[boolean_index]

print(f"Subarray (shape: {subarray.shape}): {subarray}")
Subarray (shape: (7,)): [ 6  7  8  9 10 11 12]

Important: Note that the resulting array is a 1D-array since the distribution of True and False values generally does not induce a valid multidimensional array structure (see the example boolean_index above). In practice, the two statements above can be combined into one to make the code more concise:

In [54]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[a > 5]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (7,)):
[ 6  7  8  9 10 11 12]

Like other advanced indexing methods, boolean indexing is quite powerful. For example, we can combine multiple logical conditions to find values of interest. For example, below we use two conditions to find all values in a that are larger than 2 and are also even — to check if a value is even we can check if the value modulo $2$ is $0$.

In [55]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[(a > 5) & (a % 2 == 0)]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (4,)):
[ 6  8 10 12]

Another more example defines the condition on only a subset of the arrays. In the code cell below we again check if the values are larger then $5$, but only in the first column of a using slices. The result will be the last row of a; see below. Before explaining this behavior, first try to see if you can figure it out yourself.

In [56]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

subarray = a[a[:,0] > 5]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Subarray (shape: (1, 4)):
[[ 9 10 11 12]]
In [57]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f"Array (shape: {a.shape}):\n{a}\n")

boolean_index = (a[:, 0] > 5)

print(f"Boolean index (shape: {boolean_index.shape}):\n{boolean_index}")
Array (shape: (3, 4)):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Boolean index (shape: (3,)):
[False False  True]

The boolean index is now only a 1D array since we only consider the first column of a. This means that the shape of the boolean index and the shape of are now different. However, both shapes are broadcast-compatible, which still allows the use of this boolean index to get the relevant values from a. More specifically, since the boolean index is of shape (3,) and the shape of (3, 4), the index gets broadcasted across all $4$ columns in a. And since the index has only a True in the last entry, we get the last entry of each column, which is simply the last row of a.

In [58]:
subarray = a[boolean_index]
print(f"Subarray (shape: {subarray.shape}):\n{subarray}")
Subarray (shape: (1, 4)):
[[ 9 10 11 12]]

To better appreciate this behavior, we will cover the concept of broadcasting in much more detail later. However, you should get the basic idea from this first example of broadcasting.


Array Math¶

Mathematical operations in NumPy provide powerful tools to manipulate and analyze numerical data efficiently. They allow you to perform complex computations directly on arrays without the need for explicit loops. The main types of mathematical operations in NumPy include:

  • Element-wise operations perform arithmetic computations such as addition, subtraction, multiplication, division, or exponentiation on corresponding elements of arrays. These operations are fast, vectorized, and can also leverage broadcasting to work with arrays of different shapes.

  • Aggregate operations compute summary statistics or reductions over arrays, such as sum(), mean(), min(), max(), or std(). These operations condense large datasets into meaningful numerical insights.

  • Linear algebra operations go beyond element-wise computations to include operations like the dot product, matrix multiplication, transposition, and determinant calculation. These are essential for advanced numerical analysis, data transformations, and machine learning algorithms.

Let's look at these three types of operations in more detail by showing some examples.

Elementwise Operations¶

Element-wise operations in NumPy are arithmetic computations that are applied independently to each corresponding element of one or more arrays. This means that when two arrays of the same shape are combined with operators like +, -, *, /, or **, NumPy performs the operation on each pair of elements at the same position and returns a new array with the results. These operations are implemented in a vectorized manner, allowing them to execute very efficiently in compiled code without explicit Python loops. Element-wise operations form the foundation for most numerical computations in NumPy, enabling fast and expressive manipulation of large datasets.

For element-wise operations between two arrays to be valid, both arrays either have to have the same shapes or the two shapes are broadcast-compatible. We cover broadcasting in more detail later, so in the following we assume that the two arrays have indeed the same shape. With this assumption in mind, let's go through the basic math operations using two simple example arrays a and b.

In [59]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])
b = np.array([[4, 1, 1, 2], [1, 4, 2, 2]])

print(f"Input array a:\n{a}\n")
print(f"Input array b:\n{b}\n")

print(f"Addition (a+b):\n{a+b}\n")
print(f"Subtraction (a-b):\n{a-b}\n")
print(f"Multiplication (a*b):\n{a*b}\n")
print(f"Division (a/b):\n{a/b}\n")
print(f"Modulo (a%b):\n{a%b}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Input array b:
[[4 1 1 2]
 [1 4 2 2]]

Addition (a+b):
[[5 3 4 6]
 [3 6 5 5]]

Subtraction (a-b):
[[-3  1  2  2]
 [ 1 -2  1  1]]

Multiplication (a*b):
[[4 2 3 8]
 [2 8 6 6]]

Division (a/b):
[[0.25 2.   3.   2.  ]
 [2.   0.5  1.5  1.5 ]]

Modulo (a%b):
[[1 0 0 0]
 [0 2 1 1]]

NumPy also provides a wide range of unary operators that perform element-wise operations on arrays, meaning each operation is applied independently to every element. Unary operators involve only one array (or operand) and include mathematical operations such as negation (-a), absolute value (np.abs(a)), square root (np.sqrt(a)), exponential (np.exp(a)), and trigonometric functions like np.sin(a) or np.cos(a). Each of these operations returns a new array where every element is the result of applying the operation to the corresponding element of the original array. The code cell below shows a few examples.

In [60]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])

print(f"Input array a:\n{a}\n")

print(f"Negation:\n{-a}\n")
print(f"Absolute:\n{np.abs(a)}\n")
print(f"Square root:\n{np.sqrt(a)}\n")
print(f"Sine:\n{np.sin(a)}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Negation:
[[-1 -2 -3 -4]
 [-2 -2 -3 -3]]

Absolute:
[[1 2 3 4]
 [2 2 3 3]]

Square root:
[[1.    1.414 1.732 2.   ]
 [1.414 1.414 1.732 1.732]]

Sine:
[[ 0.841  0.909  0.141 -0.757]
 [ 0.909  0.909  0.141  0.141]]

Of course, NumPy has many more such unary operations; check out the documentation. Note that the element-wise arithmetic operations mentioned above can also be expressed using in-built functions. For example, a + b yields the same result as np.add(a, b); see below.

In [61]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])
b = np.array([[4, 1, 1, 2], [1, 4, 2, 2]])

print(f"Input array a:\n{a}\n")
print(f"Input array b:\n{b}\n")

print(f"Addition (np.add(a, b)):\n{np.add(a, b)}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Input array b:
[[4 1 1 2]
 [1 4 2 2]]

Addition (np.add(a, b)):
[[5 3 4 6]
 [3 6 5 5]]

Aggregation Operations¶

Aggregation operations are operations that compute a single summary value (or a smaller set of values) from an entire array or along specific axes. Instead of operating element by element, these functions reduce an array's data into meaningful statistics or summaries. Common aggregate functions include np.sum() for summation, np.mean() for the average, np.min() and np.max() for finding extreme values, np.std() and np.var() for measuring variability, and np.median() for determining central tendency. These functions can be applied to an entire array or along a chosen axis, allowing users to summarize data across rows, columns, or higher dimensions.

In the following, let's consider np.sum() for our examples; the other aggregation operations behave the same way. First, we can sum up all the elements in an array (i.e., ignoring specific axis):

In [62]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])

print(f"Input array a:\n{a}\n")

print(f"Sum of all elements:\n{np.sum(a)}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Sum of all elements:
20

By specifying axis=0, we can sum elements along the first axis. This means, we sum the first element of all rows, the second element of all rows, and so on. Hence the result is not a single number but an array with all the sums:

In [63]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])

print(f"Input array a:\n{a}\n")

print(f"Sums along Axis 0:\n{np.sum(a, axis=0)}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Sums along Axis 0:
[3 4 6 7]

Of course, we can perform the same for the columns with axis=1, summing up all elements of the for row, summing up all elements of the second row, and so on...well, our example array a has only 2 rows.

In [64]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])

print(f"Input array a:\n{a}\n")

print(f"Sums along Axis 1:\n{np.sum(a, axis=1)}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Sums along Axis 1:
[10 10]

We can also sum the array entries along multiple axes at the same time. Since our example array a is of shape (2, 4) and therefore has only $2$ axes we can only sum all values along Axis 0 and 1. And since a only has these two axes, summing the entries along both (i.e., all) axes, we simply get the sum of all entries again; as shown in the code cell below.

In [65]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])

print(f"Input array a:\n{a}\n")

print(f"Sums along Axis 0 & 1:\n{np.sum(a, axis=(0,1))}\n")
Input array a:
[[1 2 3 4]
 [2 2 3 3]]

Sums along Axis 0 & 1:
20

Linear Algebra Operations¶

Linear algebra operations in NumPy extend array manipulation to include mathematical computations involving vectors, matrices, and higher-dimensional tensors. These operations go beyond simple element-wise arithmetic and include functions such as dot products, matrix multiplication, transposition, determinants, eigenvalue decomposition, and solving systems of linear equations. NumPy provides a dedicated module, numpy.linalg, which contains optimized routines for performing these calculations efficiently, often using highly optimized BLAS and LAPACK libraries under the hood.

To show an example, Let's assume we want to calculate the following matrix-vector multiplication:

$$ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 2 & 2 & 3 & 3 \end{bmatrix} * \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 30 \\ 27 \end{bmatrix} $$

For this operation, NumPy provides the dot() method to compute the dot product.

In [66]:
matrix = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])
vector = np.array([[1], [2], [3], [4]])

print(f"Matrix:\n{matrix}\n")
print(f"Vector:\n{vector}\n")

print(f"Dot product:\n{np.dot(matrix, vector)}\n")
#print(f"Dot product:\n{matrix.dot(vector)}\n")   # Same effect
Matrix:
[[1 2 3 4]
 [2 2 3 3]]

Vector:
[[1]
 [2]
 [3]
 [4]]

Dot product:
[[30]
 [27]]

Of course, this example can easily be extended to a product between two matrices. For example, assume we now want to compute the following:

$$ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 2 & 2 & 3 & 3 \end{bmatrix} * \begin{bmatrix} 4 & 1 \\ 1 & 4 \\ 1 & 2 \\ 2 & 2 \end{bmatrix} = \begin{bmatrix} 17 & 23 \\ 19 & 22 \end{bmatrix} $$

Again, we can easily compute this using the dot() method.

In [67]:
matrix1 = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])
matrix2 = np.array([[4, 1, 1, 2], [1, 4, 2, 2]]).T

print(f"Matrix 1:\n{matrix1}\n")
print(f"Matrix 2:\n{matrix2}\n")

print(f"Matrix product:\n{np.dot(matrix1, matrix2)}\n")
Matrix 1:
[[1 2 3 4]
 [2 2 3 3]]

Matrix 2:
[[4 1]
 [1 4]
 [1 2]
 [2 2]]

Matrix product:
[[17 23]
 [19 22]]

When performing matrix multiplication on arrays with more than two dimensions, the operation is applied in a batch-wise or broadcasted manner. This means that NumPy treats the last two dimensions of each array as matrices to be multiplied, while all preceding dimensions are considered as batch dimensions that are broadcasted according to standard broadcasting rules.

For example, if you have an array A of shape (2, 3, 4) and another array B of shape (2, 4, 5), NumPy will interpret each pair of $3\times 4$ and $4\times 5$ matrices as a separate multiplication for each of the two batches, producing an output of shape (2, 3, 5). This behavior allows efficient computation of multiple matrix products in parallel without using explicit loops. Here is the same example implemented using NumPy — let's ignore the exact values and focus only on the shapes:

In [68]:
A = np.random.rand(2, 3, 4)
B = np.random.rand(2, 4, 5)

C = np.matmul(A, B)
#C = A @ B  # Same effect
print(C.shape)
(2, 3, 5)

The matmul() method, available as np.matmul() or via the @ operator, is used to perform matrix multiplication. matmul() is particularly powerful for higher-dimensional arrays. For 2D arrays, it behaves like standard matrix multiplication. For arrays with more than two dimensions, it performs batch matrix multiplication, treating the last two dimensions as matrices and broadcasting over the leading dimensions. This makes it ideal for operations on batches of matrices, which are common in machine learning, physics simulations, and other applications involving multidimensional data. Here's a comparison table showing the difference between dot() andmatmul() for different array dimensions:

Array Dimensions np.dot() Behavior np.matmul() Behavior
1D · 1D Returns inner product (scalar) Returns inner product (scalar)
1D · 2D Not allowed directly; 1D is treated as row vector (may require reshape) Treats 1D as row vector $\rightarrow$ returns 1D vector
2D · 1D Treats 1D as column vector $\rightarrow$ returns 1D vector Treats 1D as column vector $\rightarrow$ returns 1D vector
2D · 2D Standard matrix multiplication Standard matrix multiplication
ND (N>2) · ND (N>2) No broadcasting; sum-product over last axis of A and second-to-last axis of B Performs batch matrix multiplication with broadcasting over leading dimensions
ND · 2D / 2D · ND Limited support; may require reshaping Supports batch operations consistently

This table highlights that the main advantage of matmul() is its consistent handling of higher-dimensional arrays and batch operations, whereas dot() is more limited to 1D and 2D arrays.


Broadcasting¶

Broadcasting in NumPy is a powerful mechanism that allows arithmetic operations to be performed on arrays of different shapes without explicitly reshaping or replicating data. The concept relies on "stretching" the smaller array along the axis where its size is 1 (or missing) so that it matches the shape of the larger array. This enables element-wise operations between arrays that would otherwise be incompatible, without consuming extra memory for repeated copies. Broadcasting is applicable whenever the shapes of two arrays are compatible according to specific rules: starting from the trailing axes, the axes must either be equal, or one of them must be $1$.

To better understand this, let's look at a concrete example where we add a (2, 4) array a and a (2, 1) array b. Although both arrays differ in the size of the second axis, the size of the second axis of b is $1$. This means that both shapes are broadcast-compatible. This means, if we add both arrays, we add b to each entry of a along the second axis. Or in simple terms, since b represents a column vector, we add b to all $4$ columns of a; here is the corresponding code:

In [69]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])
b = np.array([[1], [3]])

print(f"Array a (shape: {a.shape}):\n{a}\n")
print(f"Array b (shape: {b.shape}):\n{b}\n")

print(f"Sum a+b:\n{a+b}\n")
Array a (shape: (2, 4)):
[[1 2 3 4]
 [2 2 3 3]]

Array b (shape: (2, 1)):
[[1]
 [3]]

Sum a+b:
[[2 3 4 5]
 [5 5 6 6]]

We can do the same with a row vector, i.e., an array of shape (4,) or (1, 4) (both works here). Broadcasting works in this case because now both arrays match with respect to the second axis, differ with respect to the first axis, and the size of the first axis of b is $1$. This makes both shapes broadcast-compatible once again.

In [70]:
a = np.array([[1, 2, 3, 4], [2, 2, 3, 3]])
b = np.array([[1, 1, 1, 1]])  # Shape: (4,)
#b = np.array([[1, 1, 1, 1]])  # Shape: (1, 4)


print(f"Array a (shape: {a.shape}):\n{a}\n")
print(f"Array b (shape: {b.shape}):\n{b}\n")

print(f"Sum a+b:\n{a+b}\n")
Array a (shape: (2, 4)):
[[1 2 3 4]
 [2 2 3 3]]

Array b (shape: (1, 4)):
[[1 1 1 1]]

Sum a+b:
[[2 3 4 5]
 [3 3 4 4]]

Broadcasting also works for multidimensional arrays with more then $2$ axes, following the exact same rules. For more details, check out the documentation on broadcasting.


Arg* Methods¶

The **arg*** methods are functions that return the indices of elements in an array that satisfy certain conditions, rather than the values themselves. The most commonly used methods are np.argmax() and np.argmin(), which return the index of the maximum or minimum element in an array, respectively. These methods can operate on the entire array or along a specified axis in multidimensional arrays, making it easy to identify the location of extreme values in large datasets.

For a conrete example, let's consider np.argmax(). The method returns the index of the first occurrence of the maximum value in an array. It works by scanning the array (or along a specified axis for multidimensional arrays) to identify the element with the largest value and then returning its position rather than the value itself. The position or index of the maximum values is the position or index of the flattened version of the array. This implies that the index is always a single value. This also means that, if we want to use the index to actually get the maximum value, we need to either

  • Flatten the array so that the index/position match, or
  • Convert the scalar index to the corresponding multidimensional index with respect to the shape of the input array.

The code below shows a simple example using both approaches.

In [71]:
np.random.seed(1)
a = np.random.randint(1, 20, size=(3, 5))

print(f"Input array a:\n{a}\n")

max_idx = np.argmax(a)
max_val = a.ravel()[max_idx]
print(f"Index with the first maximum values: {max_idx} (value: {max_val})")

max_idx_multi = np.unravel_index(max_idx, a.shape)
max_val = a[max_idx_multi]
print(f"Index with the first maximum values: {max_idx} (value: {max_val})")
Input array a:
[[ 6 12 13  9 10]
 [12  6 16  1 17]
 [ 2 13  8 14  7]]

Index with the first maximum values: 9 (value: 17)
Index with the first maximum values: 9 (value: 17)

As mentioned above, we can also get the indices of the maximum values along individual axes instead of the complete array. In this case, we get multiple indices:

In [72]:
np.random.seed(1)
a = np.random.randint(1, 20, size=(3, 5))

print(f"Input array a:\n{a}\n")

print(f"Indices of maximum values for each column:\n{np.argmax(a, axis=0)}\n")
print(f"Indices of maximum values for each row:\n{np.argmax(a, axis=1)}\n")
Input array a:
[[ 6 12 13  9 10]
 [12  6 16  1 17]
 [ 2 13  8 14  7]]

Indices of maximum values for each column:
[1 2 1 2 1]

Indices of maximum values for each row:
[2 4 3]

For example, the result [1 2 1 2 1] shows the indices of the 5 maximum values for the 5 columns of a. Since the first column is [6 12 2], the maximum value is at Index 1, represented by [1 ...] and so on. Naturally, np.argmin() works exactly the same just for finding the indices of the minimum values.

Another common method is np.argsort() which returns the indices that would sort an array. That means, this method sorts the element in the array but then returns the indices and not the actual values. In contrast to np.argmax() andnp.argmin(), np.argsort() sort with respect to the last dimension (axis=-1) by default. So to sort irregardless of an axis, one has to explicitly set axis=None.

In [73]:
np.random.seed(1)
a = np.random.randint(1, 20, size=(3, 5))

print(f"Input array a:\n{a}\n")

print(f"Indices of sorted array entries:\n{np.argsort(a, axis=None)}\n")
Input array a:
[[ 6 12 13  9 10]
 [12  6 16  1 17]
 [ 2 13  8 14  7]]

Indices of sorted array entries:
[ 8 10  0  6 14 12  3  4  1  5 11  2 13  7  9]

Again, the order of indices reflects a flatted version of the input array. For example, in our example array a, the smallest value is 1 and can be found in index/position 8 with respect to the flattened version a. Thus, 8 is the first entry in the returned array of indices; see the output of the previous code cell.

When setting axis=-1 or any other valid value — here either 0 or 1 since a has only 2 axes — we get the indices that would sort the values with respect to the chosen dimension/axis. For example:

In [74]:
np.random.seed(1)
a = np.random.randint(1, 20, size=(3, 5))

print(f"Input array a:\n{a}\n")

print(f"Indices of sorted array entries (axis=1):\n{np.argsort(a, axis=1)}\n")
# np.argsort(a, axis=-1)  # Same effect since a has only to axes 0 and 1.
Input array a:
[[ 6 12 13  9 10]
 [12  6 16  1 17]
 [ 2 13  8 14  7]]

Indices of sorted array entries (axis=1):
[[0 3 4 1 2]
 [3 1 0 2 4]
 [0 4 2 1 3]]

This result tells us that, e.g., for the first row, the smallest value (here: $6$) is at Index $0$ and the largest value (here: $13$) at Index $2$.


Practical Example Use Cases¶

When you're new to NumPy (or data science with Python, in general), the usefulness and power of indexing, slicing, broadcasting, and so on are unlikely to be very obvious from just going through this notebook so far and running the basic examples. The benefits of NumPy really begin to shine when actually using it to solve problems and perform computation that utilize NumPy arrays. Therefore, in the last part of the notebook, we go through some concrete use cases to see how NumPy can make our lives easier. The two use cases are still very simple, but should still be able to appreciate that solving using NumPy yields a shorter and cleaner implementation — in practice, typically also a much faster one, but given the toy data we use here, you won't see a noticeable difference.

Finding the K-Nearest Neighbors¶

The k-nearest neighbors (k-NN) are data points in a dataset that are most similar or closest to a given query point in a feature space. In the k-NN algorithm, the k nearest neighbors are identified based on a distance metric, such as Euclidean distance or cosine similarity. These neighbors play a crucial role in making predictions or determining the similarity of the query point. The k-nearest neighbor (k-NN) problem is a widely used algorithm in machine learning and data mining. It is a non-parametric method used for classification and regression tasks. Some common applications of the k-NN problem include:

  • Classification: k-NN can be used for classifying data points into different categories. Given a new data point, the algorithm identifies the k nearest neighbors based on a distance metric (such as Euclidean distance) and assigns the class label that is most common among its k nearest neighbors.

  • Regression: In addition to classification, k-NN can also be used for regression tasks. Instead of assigning class labels, the algorithm estimates the value of a target variable based on the average or weighted average of the target values of its k nearest neighbors.

  • Recommender systems: k-NN can be applied to recommend items to users based on their similarity to other users or items. For example, in collaborative filtering, the algorithm can find similar users and recommend items that those similar users have liked or purchased.

  • Anomaly detection: k-NN can be used to identify outliers or anomalies in a dataset. The algorithm measures the distance of a data point to its k nearest neighbors, and if a point is significantly distant from its neighbors, it can be considered as an anomaly.

  • Image recognition: k-NN can be used in image recognition tasks. Given an input image, the algorithm can compare it with a set of labeled images to determine the class label of the input image based on the labels of its k nearest neighbors.

  • Text classification: k-NN can be applied to classify text documents based on their similarity. By representing documents as vectors, such as using the term frequency-inverse document frequency (TF-IDF) representation, the algorithm can find the k most similar documents and assign a class label based on those neighbors.

These are just a few examples of the many applications of the k-nearest neighbor problem. Its simplicity and effectiveness make it a popular choice in various fields where pattern recognition and similarity-based analysis are required. For the following example, we create a random dataset D of $10$ data points, with each data point featuring $5$ attributes/features/coordinates. This means that D can be represented as a matrix (2D array) of shape (10, 5). We also need a data point x for which we want to find the k-nearest neighbors. Naturally, x must have the same number of $5$ attributes/features/coordinates.

In [75]:
np.random.seed(10) # Just to ensure that we get the same random numbers

D = np.random.rand(10, 5)
x = np.random.rand(5)

print('Dataset D:')
print(D) 
print()
print('Data point x:')
print(x)
Dataset D:
[[0.771 0.021 0.634 0.749 0.499]
 [0.225 0.198 0.761 0.169 0.088]
 [0.685 0.953 0.004 0.512 0.813]
 [0.613 0.722 0.292 0.918 0.715]
 [0.543 0.142 0.373 0.674 0.442]
 [0.434 0.618 0.513 0.65  0.601]
 [0.805 0.522 0.909 0.319 0.09 ]
 [0.301 0.114 0.829 0.047 0.626]
 [0.548 0.819 0.199 0.857 0.352]
 [0.755 0.296 0.884 0.326 0.165]]

Data point x:
[0.393 0.093 0.821 0.151 0.384]

Given x, we now want to find the, say, $3$ most similar data points, i.e., its $3$-nearest neighbors. Since we have numerical features, we can directly apply the Euclidean distance to measure the similarity between two data points. Given two multidimensional data points p and q with n number of features., The Euclidean Distance d(p, q) is defined as:

$$\large d(p, q) = \sum_{i=1}^n \sqrt{(p_i - q_i)^2} = \sqrt{(p_1 - q_1)^2 + (p_2 - q_2)^2 + ... (p_n - q_n)^2 } $$

Since we have $10$ data points, we will get $10$ distances from which we then pick the $3$ shortest ones. So let's first compute all the $10$ distances between x and the data points in D. Using the built-in methods and the concept of broadcasting, NumPy makes this step very straightforward. In a sense, we can directly implement the formula given above — particularly notice the first line where we use broadcasting to subtract x from all points in D; no loop or anything else required!

In [76]:
# Use broadcasting to subtract x from all data points in D
R = D - x

# Square all elements
R = np.square(R)

# Sum up all squared elements for each row
distances = np.sum(R, axis=1)

# Calculate the final square roots
distances = np.sqrt(distances)

# Print the 10 distances
print(distances)
[0.744 0.361 1.344 1.192 0.709 0.817 0.69  0.28  1.199 0.504]

Now that we have all $10$ distances, we only need to find the $3$ shortest ones. In fact, we want to find the positions/indices of the smallest values in distances as they correspond to the $3$ most similar data points. Again, NumPy has us covered.

In [77]:
# Get the indices of the 3 most similar data points
top_i = np.argsort(distances)[:3]

print(top_i)
[7 1 9]

Now we know the indices of the data points in D are the ones most similar to x. Of course, we can also print those $3$ data points:

In [78]:
D[top_i]
Out[78]:
array([[0.301, 0.114, 0.829, 0.047, 0.626],
       [0.225, 0.198, 0.761, 0.169, 0.088],
       [0.755, 0.296, 0.884, 0.326, 0.165]])

Side note: Calculating the Euclidean Distances between vectors is such a common task that packages such as scitkit-learn provide ready-made methods for it. But note that under the hood, scitkit-learn itself relies heavily on NumPy! The code cell below computes the Euclidean Distances between x and all data points in D using the method euclidean-distances() provided by scikit-learn.

In [79]:
euclidean_distances(D, x.reshape(1,-1))
Out[79]:
array([[0.744],
       [0.361],
       [1.344],
       [1.192],
       [0.709],
       [0.817],
       [0.69 ],
       [0.28 ],
       [1.199],
       [0.504]])

Of course, this results matches our distances array where we "manually" computed all Euclidean distances.

Feature Scaling: Standardization¶

Feature scaling or standardization is an important preprocessing step in data mining and machine learning. It involves transforming the features or variables of a dataset to a common scale or range. Here are the key reasons why feature scaling is important:

  • Avoiding bias due to different scales: Features in a dataset often have different scales and units of measurement. For example, one feature might have values ranging from 0 to 100, while another feature could range from 0 to 1000. These differences in scales can introduce bias in certain machine learning algorithms that are sensitive to the magnitude of features. Scaling the features ensures that they are on a similar scale, preventing certain features from dominating the learning process solely because of their larger values.

  • Enhancing convergence and performance: Many optimization algorithms used in machine learning, such as gradient descent, rely on the assumption that features are on a similar scale. When features have vastly different scales, the convergence of these algorithms can be slow or inefficient. By scaling the features, optimization algorithms can converge more quickly, leading to faster and more accurate models.

  • Facilitating distance-based algorithms: Algorithms that utilize distances or similarities between data points, such as k-nearest neighbors (k-NN) or clustering algorithms, are sensitive to the scale of features. Without proper scaling, features with larger scales can dominate the distance calculations and influence the results disproportionately. Scaling the features ensures that all features contribute fairly to the distance calculations and allows for more reliable and meaningful comparisons.

  • Improving model performance and interpretation: Feature scaling can positively impact the performance of various machine learning models. Models such as support vector machines (SVMs), k-NN, and neural networks often benefit from feature scaling, as it aids in finding optimal hyperplanes, improving decision boundaries, and ensuring efficient model training. Furthermore, feature scaling can make the coefficients or weights in linear models more interpretable and comparable, as they represent the importance or contribution of each feature on a common scale.

Common methods of feature scaling include normalization (scaling features to a specific range, e.g., [0, 1]) and standardization (scaling features to have zero mean and unit variance). The choice of scaling method depends on the specific requirements and characteristics of the dataset and the machine learning algorithms being used. Overall, feature scaling or standardization is crucial in data mining as it promotes fairness, efficiency, and effectiveness in various machine learning algorithms, improves model performance, and aids in meaningful interpretation of results.

To illustrate the purpose of feature scaling, have second look at the formula for calculating the Euclidean distance; here for just 2 features:

$$\large d(p, q) = \sqrt{(p_1 - q_1)^2 + (p_2 - q_2)^2 } $$

As you can see, the Euclidean Distance depends on the difference between the respective feature values. This can cause a problem if the two features are of very different magnitudes. For example, assume that the values of Feature 1 are around $0.0001$ while the values of Feature 2 are around $1,000$. That means that the values for Feature 1 will hardly affect the Euclidean Distance between two data points since $(p_1 - q_1)^2$ will always be negligibly smaller compared to $(p_2 - q_2)^2$.

Feature scaling aims to remedy this issue by bringing all features to a common order of magnitude. One approach for feature scaling is standardization (scaling features to have zero mean and unit variance). If x is a feature and x_i is the feature value for data point $i$, each feature value gets standardized by subtracting the feature mean and the feature standard deviation.

$$\large x_i = \frac{x_i - \mu}{\sigma} $$

Let's see how it looks for example. First, we can create another random dataset. In this case, however, we artificially scale up or down the values for the different features. For example, the first feature will be of magnitude $0.01$, while the third feature will be of magnitude $100$.

In [80]:
np.random.seed(10) # Just to ensure that we get the same random numbers

D = np.random.rand(10, 5) * np.array([0.05, 10, 500, 0.1, 50])

print('Dataset D:')
print(D)
Dataset D:
[[  0.039   0.208 316.824   0.075  24.925]
 [  0.011   1.981 380.265   0.017   4.417]
 [  0.034   9.534   1.974   0.051  40.631]
 [  0.031   7.218 145.938   0.092  35.729]
 [  0.027   1.422 186.67    0.067  22.092]
 [  0.022   6.178 256.569   0.065  30.052]
 [  0.04    5.216 454.324   0.032   4.523]
 [  0.015   1.14  414.341   0.005  31.314]
 [  0.027   8.193  99.474   0.086  17.583]
 [  0.038   2.96  441.968   0.033   8.251]]

Without feature scaling, calculating the Euclidean Distance would most of the time be dominated by Feature 3. This would make Feature 3 more important than the others, which in practice is not a preferred assumption. Using the built-in methods np.mean() and np.std() we can easily calculate the mean and standard deviation for each feature. Note that we need to specify the axis parameter for this, otherwise we calculate the mean and standard deviation over all values in D.

In [81]:
col_mean = np.mean(D, axis=0)
col_std = np.std(D, axis=0)

print(col_mean, "<= Means for all features")
print(col_std, "<= Standard deviations for all features")
[  0.028   4.405 269.835   0.052  21.952] <= Means for all features
[  0.009   3.123 149.052   0.028  12.324] <= Standard deviations for all features

With the $5$ means and $5$ standard deviations for all $5$ features, we can again use broadcasting to update each feature value with respect to its "own" mean and standard deviation.

In [82]:
D_standardized = (D - col_mean) / col_std

print(D_standardized)
[[ 1.078 -1.344  0.315  0.807  0.241]
 [-1.817 -0.776  0.741 -1.256 -1.423]
 [ 0.622  1.643 -1.797 -0.035  1.516]
 [ 0.237  0.901 -0.831  1.408  1.118]
 [-0.134 -0.955 -0.558  0.541  0.011]
 [-0.709  0.568 -0.089  0.457  0.657]
 [ 1.257  0.26   1.238 -0.722 -1.414]
 [-1.415 -1.046  0.969 -1.691  0.76 ]
 [-0.107  1.213 -1.143  1.191 -0.355]
 [ 0.989 -0.463  1.155 -0.699 -1.112]]

As you can see, now all features are roughly in the same ballpark, with no features standing out and potentially dominating the calculation of Euclidean Distances.

Side note: Again, standardization or normalization of the data is a very common task, and as such there are off-the-shelf solutions to accomplish this available. The code below uses the method provided by scikit-learn. The results should of course be identical. You can check out the StandardScaler class of scikit-learn.

In [83]:
scaler = StandardScaler()

D_standardized_sklearn = scaler.fit_transform(D)

print(D_standardized_sklearn)
[[ 1.078 -1.344  0.315  0.807  0.241]
 [-1.817 -0.776  0.741 -1.256 -1.423]
 [ 0.622  1.643 -1.797 -0.035  1.516]
 [ 0.237  0.901 -0.831  1.408  1.118]
 [-0.134 -0.955 -0.558  0.541  0.011]
 [-0.709  0.568 -0.089  0.457  0.657]
 [ 1.257  0.26   1.238 -0.722 -1.414]
 [-1.415 -1.046  0.969 -1.691  0.76 ]
 [-0.107  1.213 -1.143  1.191 -0.355]
 [ 0.989 -0.463  1.155 -0.699 -1.112]]

Summary¶

NumPy is a fundamental Python library for numerical computing that provides support for efficient manipulation of large arrays and matrices, along with a wide range of mathematical operations. At its core, NumPy introduces the ndarray, a high-performance multidimensional array object, which allows for fast, vectorized operations on large datasets. Unlike Python's native lists, NumPy arrays are memory-efficient, support element-wise arithmetic, and integrate seamlessly with low-level optimized code, making them ideal for computationally intensive tasks.

One of the key strengths of NumPy is its comprehensive set of array manipulation and mathematical functions, including element-wise operations, aggregate functions, and linear algebra routines. Its support for broadcasting allows arithmetic between arrays of different shapes without explicit loops, enabling concise and readable code. Additionally, NumPy's arg* methods, matrix operations, and random number generation utilities make it invaluable for data preprocessing, statistical analysis, and simulation tasks, which are common in data science and scientific computing.

Another major advantage of NumPy is its efficiency and speed. Because most operations are implemented in optimized C and Fortran code under the hood, array computations are often orders of magnitude faster than equivalent Python code using loops. This performance is crucial when working with large datasets, performing numerical simulations, or training machine learning models, where computational bottlenecks can otherwise slow down workflows significantly.

Being proficient in NumPy is particularly important for machine learning and deep learning because it forms the foundation for more advanced frameworks such as PyTorch and TensorFlow. These frameworks use concepts like multidimensional tensors, broadcasting, and vectorized operations that are directly inspired by NumPy. A solid understanding of NumPy not only helps in writing efficient code but also makes it easier to learn and understand the behavior of these libraries, enabling faster experimentation and model development.

In summary, NumPy is more than just a library for arrays, it is a cornerstone of modern data science and scientific computing. Its strengths in efficient array manipulation, fast mathematical operations, and support for linear algebra make it indispensable for tasks ranging from data analysis to machine learning. Mastering NumPy provides a strong foundation for working with other Python-based computational frameworks and is an essential skill for anyone pursuing careers in data science, machine learning, artificial intelligence, or scientific research.

In [ ]: