Problem-set for SciPy2010 NumPy Tutorial

These files may be downloaded from http://mentat.za.net/numpy/kittens

Please do explore beyond the problems given, and feel free to ask questions at any time.

Note

Solutions to some problems are provided in the source tree as problem_name_solution.py. Do not look at these until you've made an attempt yourself!

P1: The NumPy N-dimensional array

  1. What is the maximum number of dimensions a NumPy array can have? Use one of the array constructors (np.zeros, np.empty, np.random.random, etc.) to find out.

  2. Construct the following two arrays:

    x = np.array([[1, 2], [3, 4]], order='C', dtype=np.uint8)
    y = np.array([[1, 2], [3, 4]], order='F', dtype=np.uint8)
    

    Compare the bytes they store in memory by using

    [ord(c) for c in x.data]
    

    Note that, even though these arrays store data in different memory order, they are identical from the user's perspective.

    print x
    print y
    

  3. Examine the bytes stored by the following array (using the "ord" trick shown above).

    x = np.array([[1, 2], [3, 4]], dtype=np.uint32)
    

    Note that, on most laptops, the byte order will be little Endian, i.e. least significant byte first.

  4. Create a 3x3 ndarray called x. Slice out the first row and call that y. Convince yourself that y's base pointer is x.

    y.base is x
    

    Modify y and see whether x changes.

  5. Advanced: Attempt the Fortran-ordering quiz.

P2: Broadcasting

  1. Reproduce z from the following snippet, using broadcasting instead of mgrid. Hint: Use ogrid.

    x, y = np.mgrid[:10, :5]
    z = x + y
    

  2. In our solution, broadcasting is used "behind the scenes". To see what happens more clearly, apply np.broadcast_arrays on the x and y from ogrid. This should correspond to the x and y produced by mgrid.

  3. Benchmark the two approaches (mgrid vs ogrid), using IPython's timeit function. Can you explain the difference in execution time?

  4. Given a list of 3-dimensional coordinates,

    [[1, 2, 10],
     [3, 4, 20],
     [5, 6, 30],
     [7, 8, 40]]
    

    Normalise each coordinate by dividing with its Z (3rd) element. For example, the first row becomes:

    [1/10, 2/10, 10/10]
    

P3: Indexing

  1. Create a 3x3 ndarray. Use fancy indexing to slice out the diagonal elements.

  2. Predict and verify the shape of the following slicing operation. Remember: index arrays are broadcast first, then come slices.

    x = np.empty((10, 8, 6, 5, 4))
    
    idx0 = np.zeros((3, 8)).astype(int)
    idx1 = np.zeros((3, 1))
    
    x[1:2, z0, 1:3, 3:4, z1]
    

  3. Advanced: This is not strictly speaking a question on indexing, but it's a fun exercise either way.

    Construct an array

    x = np.arange(12, dtype=np.int32).reshape((3, 4))
    

    so that x is

    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])
    

    Now, provide to np.lib.stride_tricks.as_strided the strides necessary to view a sliding 2x2 window over this array. The output should be

    array([[[[ 0,  1],
             [ 4,  5]],
    
            [[ 1,  2],
             [ 5,  6]],
    
            [[ 2,  3],
             [ 6,  7]]],
    
    
           [[[ 4,  5],
             [ 8,  9]],
    
            [[ 5,  6],
             [ 9, 10]],
    
            [[ 6,  7],
             [10, 11]]]], dtype=int32)
    

    The code is of the form

    z = as_strided(x, shape=(2, 3, 2, 2),
                      strides=(..., ..., ..., ...))
    

    This sort of stride manipulation is very handy when applying region based statistics or operators.

P4: Structured Arrays

  1. Design a data-type for storing the following record:

    • Timestamp in nanoseconds (a 64-bit unsigned integer)
    • Position (x- and y-coordinates, stored as floating point numbers)

    Use it to represent the following data:

    x = np.array([(100, (0, 0.5)),
                  (200, (0, 10.3)),
                  (300, (5.5, 15.1))], dtype=XXX)
    

  2. Consider structured_arrays/data.txt. Modify load_txt_template.py to load the data in this file. This requires specifying a data-type that encapsulates a record such as

    # name  x       y       block - 2x3 ints
    aaaa    1.0     8.0     1 2 3 4 5 6
    

  3. Create two structured arrays of your choosing. Use the np.savez command to save these to a single data-file. Load the data-file using np.load and confirm whether the data survived the round-trip. (Saving data using save or savez is highly recommended over pickling.)

P5: Universal Functions

  1. Modify the code provided in ufunc to create your own universal function.

P6: Array Interface

Documentation for NumPy's __array_interface__ may be found in the online docs.

  1. An author of a foreign package (array_interface/mutable_str.py) provides a string class that allocates its own memory:

    In [1]: from mutable_str import MutableString
    
    In [2]: s = MutableString('abcde')
    
    In [3]: print s
    abcde
    

    You'd like to view these mutable strings as ndarrays, in order to manipulate the underlying memory.

    1. Add an __array_interface__ dictionary attribute to s, then convert s to an ndarray. Use the given array_interface/template.py as a guide.
    2. Add "1" to the array. Now print the original string to ensure that its value was modified.

P7: Optimisation: Demo Only