Friday, July 1, 2016

Numpy: Arrays, Vectors and Matrices

Define numpy arrays of different shapes:

import numpy as np
a = np.array([1, 2])
b = np.array([[1, 2], [3, 4]])

print a.shape, type(a)
print b.shape, type(b)

(2,)

(2, 2)

Numpy arrays are not exactly row or column vectors or matrices. Of course, we can design proper row/column vectors and matrices using the np.matrix construct. We can even use a simpler Matlab/Octave like method to build matrices, by using ";" to start new rows, and comma or spaces to separate successive row elements.

am = np.matrix([1, 2]).T  # column vector
bm = np.matrix('1, 2; 3, 4')

print am.shape, type(am)
print bm.shape, type(bm)

(2, 1)
(2, 2)



Matrix Operations


The array "a" is not a linear algebra vector. So if I try to take a transpose. It doesn't quite produce the expected result.

print a, a.T
[1 2] [1 2]

But it works for matrices defined as arrays of arrays.

print b, b.T
[[1 2] [3 4]] [[1 3] [2 4]]

To multiply vectors and matrices that ndarrays, use the "dot" command.

print np.dot(b,a)
[ 5 11]

For actual matrix objects, the * operator is overloaded, and can perform matrix multiplication.

print bm*am
[[ 5] [11]]

np.dot(bm,am)
matrix([[ 5], [11]])


Converting array to row or column vector


It is easy to convert a python array to a linear algebra row or column vector either by using the "reshape" command (a.reshape(2,1)), or alternatively,
a = np.array([1, 2])
a = a[:, np.newaxis] # makes into column vector

print a, a.shape, type(a)
[[1] [2]] (2, 1) 

But you need to be careful with * operation. Don't use it unless all the underlying objects are matrix types.

print b*a
[[1 2] [6 8]]

Each row of b was multiplied by the row of a.



Efficiency


That said, it appears that np.dot on array types is faster than other alternatives.

%timeit bm*am
The slowest run took 9.73 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 5.76 µs per loop
%timeit np.dot(b,a)
The slowest run took 7.84 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.52 µs per loop
%timeit np.dot(bm,am)
The slowest run took 7.88 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 3.3 µs per loop

No comments: