Scientific Tools for Python

NumPy, SciPy and Matplotlib

Authors: Stéfan van der Walt (DSP Lab, Univ of Stellenbosch)
Albert Strasheim (DSP Lab, Univ of Stellenbosch)
Neilen Marais (CEMAGG, Univ of Stellenbosch)
Date: 09/09/2006

Normally I give these talks to convince people to move away from MATLAB to Python. However, in this case, you already know that Python is the best thing since sliced bread. Further, you are not all scientists, so I'd rather proceed to give you a quick overview of the different packages we use, with some emphasis on wrapping C and Fortran code. If you are interested in more detail, you are welcome to talk to me afterwards.

Pieces of the Puzzle

Being the array class underlying all other scientific packages, NumPy needs to be relatively small and easy to build. It therefore contains no Fortran code (although there are algorithms in C that derive from Fortran implementations). It also makes use of BLAS, LAPACK and ATLAS if those are installed.

SciPy, on the other hand, contains a lot of Fortran code.

NumPy

SciPy

Pretty pictures with matplotlib

More pretty pictures

radar.png

Basic array operations

Constructing an array:

import numpy as N
x = N.array([1,2,3])
x = N.array([1,2,3], dtype=int)
x = N.zeros((10,10,3), dtype=float)
x = N.empty((10,10,3), dtype=float)

Slicing over known dimensions:

>>> import numpy as N
>>> x = N.random.random((5,5))
>>> x
array([[ 0.36155788,  0.94092246,  0.87766115,  0.38876014,  0.14242888],
       
[ 0.41110453,  0.85805215,  0.77545163,  0.81612129,  0.68321265],
       
[ 0.8418767 ,  0.16841387,  0.7550585 ,  0.99356193,  0.65077119],
       
[ 0.67116546,  0.97694933,  0.37857987,  0.71123056,  0.43778407],
       
[ 0.52968691,  0.8107153 ,  0.65152697,  0.97804839,  0.27858883]])
>>> x[0,:]
array([ 0.36155788,  0.94092246,  0.87766115,  0.38876014,  0.14242888])

Slicing over all unknown dimensions:

x = N.random.rand((5,5,3))
x[0,1,1]
x[1,...]

Further array manipulation

Reshaping:

>>> x = N.arange(12)
>>> x
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> x.reshape((4,3))
array([[ 0,  1,  2],
       
[ 3,  4,  5],
       
[ 6,  7,  8],
       
[ 9, 10, 11]])

Convenient axis-wise operators:

>>> import numpy as N
>>> x = N.ones(12).reshape((4,3))
>>> x
array([[ 1.,  1.,  1.],
       
[ 1.,  1.,  1.],
       
[ 1.,  1.,  1.],
       
[ 1.,  1.,  1.]])
>>> x.sum()
12.0
>>> x.sum(axis=0)
array([ 4.,  4.,  4.])
>>> x.sum(axis=1)
array([ 3.,  3.,  3.,  3.])

Record arrays

Example:

In [1]: import numpy as N

In [2]: dt = N.dtype([('id',int),('score',float)])

In [3]: N.zeros(15,dtype=dt)
Out[3]:
array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0),
       
(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0),
       
(0, 0.0), (0, 0.0), (0, 0.0)],
      
dtype=[('id', '<i4'), ('score', '<f8')])

In [4]: users = N.zeros(15,dtype=dt)

In [5]: users['id']
Out[5]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [6]: users['score']
Out[6]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        
0.,  0.])

Arrays in memory

Changing dtype and viewing memory:

>>> import numpy as N
>>> x = N.array([255,255], N.uint8)
>>> x.astype(N.uint16)
array([255, 255], dtype=uint16)
>>> x.view(N.uint16)
array([65535], dtype=uint16)

Array interface

Python-side:

In [4]: x = N.array([1,2,3])

In [5]: x.__array_interface__
Out[5]:
{'data': (137220152, False),
 
'descr': [('', '<i4')],
 
'shape': (3,),
 
'strides': None,
 
'typestr': '<i4',
 
'version': 3}

C-side:

typedef struct {
    
int two;              /* contains the integer 2 -- simple sanity check */
    
int nd;               /* number of dimensions */
    
char typekind;        /* kind in array --- character code of typestr */
    
int itemsize;         /* size of each element */
    
int flags;            /* flags indicating how the data should be interpreted */
                          
/*   must set ARR_HAS_DESCR bit to validate descr */
    
Py_intptr_t *shape;   /* A length-nd array of shape information */
    
Py_intptr_t *strides; /* A length-nd array of stride information */
    
void *data;           /* A pointer to the first element of the array */
    
PyObject *descr;      /* NULL or data-description (same as descr key
                                  of __array_interface__) -- must set ARR_HAS_DESCR
                                  flag or this will be ignored. */

} PyArrayInterface;

Extending numpy

Further resources