## Sunday, April 14, 2013

### Linear Least Squares: What technique should I use? (part 1)

The quintessential least-squares problem looks like $A x = b,$ where $A_{m \times n}$ is a "tall" matrix with $m > n$, $x_{n \times 1}$ is the vector we are trying to "regress", and $b_{m \times 1}$ is a vector as tall as $A$.

Since we have too many equations or constraints (m) compared with unknowns (n), we can't solve the problem in the usual sense. Hence we seek a least-squares solution, which can be obtained by hitting both sides of the original equation with the transpose $A'$, to obtain the so called normal equation, $A' A x = A' b.$Now we have a linear system $n$ equations and $n$ unknowns, which can be solved for $x$.

There are three methods one can use:
1. Direct solution of Normal Equations: The idea is to factorize $A' A$ using Cholesky decomposition, and solve the resulting system using forward and back substitution. This is straightforward, fast (asymptotic cost $\sim m n^2 + n^3/3$), but susceptible to round-off. We will explore this last part in a little bit.
2. QR Decomposition: The idea is to first factorize $A = QR$. Then, we can write the normal equations as $A' A x = b \implies R' Q' Q R x = R' Q' b.$ Since $Q$ is orthogonal, we can simplify the equations to the triangular system $R x = Q' b.$ The asymptotic cost is $2mn^2 - 2n^3/3$. This method is less susceptible to round-off.
3. SVD: We can always bring out the ultimate thermonuclear weapon. We first factorize $A = U S V'$. The normal equations can then be simplified as above into $S V' x = U' b.$The asymptotic cost is $2mn^2 + 11 n^3$. This method is also less susceptible to round-off. An advantage is that, if required, we can throw in some regularization at a problem to make it less nasty.
In the usual case when $m \gg n$, the cost of SVD and QR is comparable. However as $m \sim n$, SVD becomes more than an order of magnitude slower.

Let us explore the relative merits of the three methods by considering a Matlab example.