Processing math: 100%

Friday, April 19, 2013

Links:

Sunday, April 14, 2013

Linear Least Squares: What technique should I use? (part 1)

The quintessential least-squares problem looks like A x = b, where A_{m \times n} is a "tall" matrix with m > n, x_{n \times 1} is the vector we are trying to "regress", and b_{m \times 1} is a vector as tall as A.

Since we have too many equations or constraints (m) compared with unknowns (n), we can't solve the problem in the usual sense. Hence we seek a least-squares solution, which can be obtained by hitting both sides of the original equation with the transpose A', to obtain the so called normal equation, A' A x = A' b.Now we have a linear system n equations and n unknowns, which can be solved for x.

There are three methods one can use:
  1. Direct solution of Normal Equations: The idea is to factorize A' A using Cholesky decomposition, and solve the resulting system using forward and back substitution. This is straightforward, fast (asymptotic cost \sim m n^2 + n^3/3), but susceptible to round-off. We will explore this last part in a little bit.
  2. QR Decomposition: The idea is to first factorize A = QR. Then, we can write the normal equations as A' A x = b \implies R' Q'  Q R x = R' Q' b. Since Q is orthogonal, we can simplify the equations to the triangular system R x = Q' b. The asymptotic cost is 2mn^2 - 2n^3/3. This method is less susceptible to round-off.
  3. SVD: We can always bring out the ultimate thermonuclear weapon. We first factorize A = U S V'. The normal equations can then be simplified as above into S V' x = U' b.The asymptotic cost is 2mn^2 + 11 n^3. This method is also less susceptible to round-off. An advantage is that, if required, we can throw in some regularization at a problem to make it less nasty.
In the usual case when m \gg n, the cost of SVD and QR is comparable. However as m \sim n, SVD becomes more than an order of magnitude slower.

Let us explore the relative merits of the three methods by considering a Matlab example.

Thursday, April 11, 2013

Chris Jordan

So I was at a dinner this evening, where this guy Chris Jordan gave an amazing talk.

You really should check out his artwork. He specializes in "garbage" - using photographs to give us a sense of the maddening material consumption in our society. It is funny how pictures can make big mind-numbing numbers (of discarded plastic bottles, cellphones, cars etc.) spring to life; how they replace apathy with grief.

His remarkable ability to make us "feel" these big numbers is captured in this TED talk.

A particularly poignant story that he told revolves around the albatrosses on Midway, and the amount of plastic found in the guts of dead baby albatrosses. As a friend apparently said to him, "if I start crying, I am worried I will never stop."


Tuesday, April 9, 2013

Conditional Column Switching in Octave or Matlab

Consider a simple Matlab/Octave exercise.

I have an N by 2 vector x. Thus, x(i,1) and x(i,2) are the entries of the i-th row. Say I want to rearrange x so that the first entry in any row is larger than the second. That is, I want to switch the entries of a row, only if a certain condition is met.

It is easy to write this in a element-by-element fashion

for i = 1:N
  if x(i,2) > x(i,1)
    c = x(i,1);
    x(i,1) = x(i,2);
    x(i,2) = c;
  endif
endfor

But we all know that element-by-element operations in Matlab or Octave suck! Here is another vectorized method:


SwitchCond = x(:,2) < x(:,1);
x(SwitchCond,[1,2]) = x(SwitchCond,[2,1]);

Besides being faster, it is also more succinct.

Wednesday, April 3, 2013

Links

1. The Art of Splash: Beautiful pictures of fluids in motion!

2. Nice back and forth between Deborah Meier and Eric Hanushek at Bridging Differences.

3. Yet another writing guide, with some nice references.