Stéfan van der Walt, University of Stellenbosch
>>> import numpy as np
>>> np.random.seed([1, 2, 3])
Broadcasting
Structured arrays
Views
Sub-classing
(We'll skip this today)
Optimisation tricks
Gems
>>> x = np.arange(4)
>>> x
array([0, 1, 2, 3])
>>> x + 3
array([3, 4, 5, 6])
>>> x = np.zeros((3,4))
>>> y = np.zeros((3,1))
>>> (x + y).shape
(3, 4)
Creating a grid of values:
>>> x, y = np.ogrid[1:10, 1:5]
>>> x
array([[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
>>> y
array([[1, 2, 3, 4]])
>>> x + 1j*y
array([[ 1.+1.j, 1.+2.j, 1.+3.j, 1.+4.j],
[ 2.+1.j, 2.+2.j, 2.+3.j, 2.+4.j],
[ 3.+1.j, 3.+2.j, 3.+3.j, 3.+4.j],
[ 4.+1.j, 4.+2.j, 4.+3.j, 4.+4.j],
[ 5.+1.j, 5.+2.j, 5.+3.j, 5.+4.j],
[ 6.+1.j, 6.+2.j, 6.+3.j, 6.+4.j],
[ 7.+1.j, 7.+2.j, 7.+3.j, 7.+4.j],
[ 8.+1.j, 8.+2.j, 8.+3.j, 8.+4.j],
[ 9.+1.j, 9.+2.j, 9.+3.j, 9.+4.j]])
Timing two different approaches:
>>> def grid_old_way():
... x, y = np.mgrid[1:100, 1:100]
... return x + 1j*y
>>> def grid_new_way():
... x, y = np.ogrid[1:100, 1:100]
... return x + 1j*y
>>> timeit grid_new_way()
1000 loops, best of 3: 301 µs per loop
>>> timeit grid_old_way()
1000 loops, best of 3: 1.44 ms per loop
>>> x = np.zeros((3, 5))
>>> y = np.zeros(8)
>>> (x[..., None] + y).shape
(3, 5, 8)
Compare dimensions, starting from last
Match when either dimension is one or None, or if dimensions are equal
Scalar 2D 3D No go ( ,) (3, 4) (3, 5, 1) (3, 5, 2) (3,) (3, 1) ( 8) ( 8) ---- ------ ---------- --------- (3,) (3, 4) (3, 5, 8) XXX
>>> book = np.arange(1000).reshape(100,10)
>>> match = np.array([np.arange(100, 110),
... np.arange(155, 165),
... np.arange(173, 183)])
>>> for v in match:
... min_dist = np.inf
... for d in book:
... dist = np.sum((v - d)**2)
... if dist < min_dist:
... best_match = d
... min_dist = dist
... print best_match, min_dist
[100 101 102 103 104 105 106 107 108 109] 0
[150 151 152 153 154 155 156 157 158 159] 250
[170 171 172 173 174 175 176 177 178 179] 90
>>> distances = (book - match[:, None, :])
>>> distances *= distances # square in-place
>>> distances = distances.sum(axis=2)
>>> chosen = distances.argmin(axis=1)
>>> book[chosen]
array([[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[150, 151, 152, 153, 154, 155, 156, 157, 158, 159],
[170, 171, 172, 173, 174, 175, 176, 177, 178, 179]])
>>> distances[range(len(match)), chosen]
array([ 0, 250, 90])
Not ideal -- the temporary array created has a large memory footprint:
( 100, 10) (3, 1, 10) ------------ (3, 100, 10)
Usually, the number of vectors compared would be much greater, which exacerbates the problem.
Date: Wed, 16 Jul 2008 16:45:37 -0500 From: <Jack.Cook@> To: <numpy-discussion@scipy.org> Subject: [Numpy-discussion] Numpy Advanced Indexing Question Greetings, I have an I,J,K 3D volume of amplitude values at regularly sampled time intervals. I have an I,J 2D slice which contains a time (K) value at each I, J location. What I would like to do is extract a subvolume at a constant +/- K window around the slice. Is there an easy way to do this using advanced indexing or some other method? Thanks in advanced for your help. - Jack
Robert Kern responds:
cube[:, :, K-half_width:K+half_width]
But Jack's problem turned out to be a bit more tricky:
I can understand how this works if K is a constant time value but in my case K varies at each location in the two-dimensional slice. In other words, if I was doing this in a for loop I would do something like this for i in range(numI): for j in range(numJ): k = slice(i,j) trace = cube(i,j,k-half_width:k+half_width) # shove trace in sub volume
First part of Robert's answer, shortened for tutorial purposes:
It would be great if you could just use slices for the first two axes: cube[:, :, slice + numpy.arange(-half_width, half_width)] but the semantics of that are a bit different for reasons I can explain later, if you want.
Why doesn't this work?
Diagonal
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
>>> x = np.arange(9).reshape((3,3))
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> x[:, [0, 1, 2]]
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.array([x[:, 0], x[:, 1], x[:, 2]])
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
Safety First!
Prepare Your Foil Deflector Beanies
Two-step process to predict the outcome:
>>> x = np.random.random((15,12,16,3))
>>> index_one = np.array([[0, 1], [2, 3], [4, 5]])
>>> index_one.shape
(3, 2)
>>> index_two = np.array([[0, 1]])
>>> index_two.shape
(1, 2)
x[5:10, index_one, :, index_two].shape
>>> x = np.random.random((15,12,16,3))
>>> index_one = np.array([[0, 1], [2, 3], [4, 5]])
>>> index_one.shape
(3, 2)
>>> index_two = np.array([[0, 1]])
>>> index_two.shape
(1, 2)
Broadcast index_one and index_two:
(3, 2) # shape of index_one (1, 2) # shape of index_two ------ (3, 2)
Keeping this in mind, we may just be able to predict the following output:
>>> x[5:10, index_one, :, index_two].shape
(3, 2, 5, 16)
TIP: Use numpy.newaxis to change array dimensions to appease the Big Broadcaster.
>>> ni, nj, nk = (10, 15, 20)
# Make a fake data cube such that cube[i,j,k] == k for all i,j,k.
>>> cube = np.empty((ni,nj,nk), dtype=int)
>>> cube[:,:,:] = np.arange(nk)[np.newaxis, np.newaxis, :]
# Pick out a random fake horizon in k.
>>> kslice = np.random.randint(5, 15, size=(ni, nj))
>>> kslice
array([[ 6, 9, 11, 10, 9, 10, 8, 13, 10, 12, 13, 9, 12, 5, 6],
[ 7, 9, 6, 14, 11, 8, 12, 7, 12, 9, 7, 9, 8, 10, 13],
[10, 14, 9, 13, 12, 11, 13, 6, 11, 9, 14, 12, 6, 8, 12],
[ 5, 11, 8, 14, 10, 10, 10, 9, 10, 5, 7, 11, 9, 13, 8],
[ 7, 8, 8, 5, 13, 9, 11, 13, 13, 12, 13, 11, 12, 5, 11],
[11, 9, 13, 14, 6, 7, 6, 14, 10, 6, 8, 14, 14, 14, 14],
[10, 12, 6, 7, 8, 6, 10, 9, 13, 6, 14, 10, 12, 10, 10],
[10, 12, 10, 9, 11, 14, 9, 6, 7, 13, 6, 11, 8, 11, 8],
[13, 14, 7, 14, 6, 14, 6, 8, 14, 7, 14, 12, 8, 5, 10],
[13, 5, 9, 7, 5, 9, 13, 10, 13, 7, 7, 9, 14, 13, 11]])
>>> half_width = 3
# These two replace the empty slices for the first two axes.
>>> idx_i = np.arange(ni)[:, np.newaxis, np.newaxis]
>>> idx_j = np.arange(nj)[np.newaxis, :, np.newaxis]
# This is the substantive part that picks out the window
>>> idx_k = kslice[:, :, np.newaxis] + \
... np.arange(-half_width, half_width+1) # (10, 15, 7)
cube[idx_i, idx_j, idx_k]
Apply the rule -- broadcast array indices, then add slices (but there are no slices in this case):
(ni, 1, 1 ) # idx_i (1 , nj, 1 ) # idx_j (ni, nj, 2*half_width + 1 ) # idx_k --------------------------- (ni, nj, 7)
Phew, tough one. Moving on to easier things...
>>> smallcube = cube[idx_i,idx_j,idx_k]
>>> smallcube.shape
(10, 15, 7)
# Now verify that our window is centered on kslice everywhere:
>>> smallcube[:,:,3]
array([[ 6, 9, 11, 10, 9, 10, 8, 13, 10, 12, 13, 9, 12, 5, 6],
[ 7, 9, 6, 14, 11, 8, 12, 7, 12, 9, 7, 9, 8, 10, 13],
[10, 14, 9, 13, 12, 11, 13, 6, 11, 9, 14, 12, 6, 8, 12],
[ 5, 11, 8, 14, 10, 10, 10, 9, 10, 5, 7, 11, 9, 13, 8],
[ 7, 8, 8, 5, 13, 9, 11, 13, 13, 12, 13, 11, 12, 5, 11],
[11, 9, 13, 14, 6, 7, 6, 14, 10, 6, 8, 14, 14, 14, 14],
[10, 12, 6, 7, 8, 6, 10, 9, 13, 6, 14, 10, 12, 10, 10],
[10, 12, 10, 9, 11, 14, 9, 6, 7, 13, 6, 11, 8, 11, 8],
[13, 14, 7, 14, 6, 14, 6, 8, 14, 7, 14, 12, 8, 5, 10],
[13, 5, 9, 7, 5, 9, 13, 10, 13, 7, 7, 9, 14, 13, 11]])
>>> (smallcube[:,:,3] == kslice).all()
True
Reading this in most languages requires somewhat messy code. Not too bad, but not very pretty either:
while ((count > 0) && (n <= NumPoints))
% get time - I8 [ms]
[lw, count] = fread(fid, 1, 'uint32');
if (count > 0) % then carry on
uw = fread(fid, 1, 'int32');
t(1,n) = (lw+uw*2^32)/1000;
% get number of bytes of data
numbytes = fread(fid, 1, 'uint32');
% read sMEASUREMENTPOSITIONINFO (11 bytes)
m(1,n) = fread(fid, 1, 'float32'); % az [rad]
m(2,n) = fread(fid, 1, 'float32'); % el [rad]
m(3,n) = fread(fid, 1, 'uint8'); % region type
m(4,n) = fread(fid, 1, 'uint16'); % region ID
g(1,n) = fread(fid, 1, 'uint8');
numsamples = (numbytes-12)/2; % 2 byte integers
a(:,n) = fread(fid, numsamples, 'int16');
The NumPy solution:
dt = np.dtype([('time', np.uint64),
('size', np.uint32),
('position', [('az', np.float32),
('el', np.float32),
('region_type', np.uint8),
('region_ID', np.uint16)]),
('gain', np.uint8),
('samples', (np.int16,2048))])
data = np.fromfile(f, dtype=dt)
Access this data using dictionary-like syntax:
data['position']['az']
Date: Wed, 16 Jul 2008 16:45:37 -0500 From: Amir To: <numpy-discussion@scipy.org> Subject: [Numpy-discussion] interleaved indexing A very beginner question about indexing: let x be an array where n = len(x). I would like to create a view y of x such that: y[i] = x[i:i+m,...] for each i and a fixed m << n so I can do things like numpy.cov(y). With n large, allocating y is a problem for me. Currently, I either do for loops in cython or translate operations into correlate() but am hoping there is an easier way, maybe using fancy indexing or broadcasting. Memory usage is secondary to speed, though.
Given input data:
>>> x = np.arange(20).reshape([4, 5])
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
What he wants is a view like this (assume m == 2):
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]]])
To do this, we need to know the following terms:
>>> x.strides
(20, 4)
>>> np.int32().itemsize
4
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]]])
We now need to manipulate the array shape and strides. The output shape must be (3, 2, 5), i.e. 3 items, each containing two rows (m == 2), and each row having 5 elements.
The strides need to change from (20, 4), to (20, 20, 4). Each item in the new output array starts at a new row, that each row consists of 20 bytes (5 elements of 4 bytes each), and each element occupies 4 bytes (int32).
Let's do it:
>>> from numpy.lib import stride_tricks
>>> stride_tricks.as_strided(x, shape=(3, 2, 5),
... strides=(20, 20, 4))
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]]])
Or the old way (which is basically what as_strided does under the hood):
>>> d = dict(x.__array_interface__)
>>> d['shape'] = (3, 2, 5)
>>> d['strides'] = (20, 20, 4)
>>> class Arr:
... __array_interface__ = d
... base = x
>>> np.array(Arr())
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]]])
Two important methods:
class InfoArray(np.ndarray):
def __new__(subtype, data, info=None, dtype=None, copy=False):
subarr = np.array(data, dtype=dtype, copy=copy)
subarr = subarr.view(subtype)
if info is not None:
subarr.info = info
elif hasattr(data, 'info'):
subarr.info = data.info
return subarr
def __array_finalize__(self, obj):
self.info = getattr(obj, 'info', {})
I would like to thank