The quintessential least-squares problem looks like \[A x = b,\] where \(A_{m \times n}\) is a "tall" matrix with \(m > n\), \(x_{n \times 1}\) is the vector we are trying to "regress", and \(b_{m \times 1}\) is a vector as tall as \(A\).
Since we have too many equations or constraints (m) compared with unknowns (n), we can't solve the problem in the usual sense. Hence we seek a least-squares solution, which can be obtained by hitting both sides of the original equation with the transpose \(A'\), to obtain the so called normal equation, \[A' A x = A' b.\]Now we have a linear system \(n\) equations and \(n\) unknowns, which can be solved for \(x\).
There are three methods one can use:
Since we have too many equations or constraints (m) compared with unknowns (n), we can't solve the problem in the usual sense. Hence we seek a least-squares solution, which can be obtained by hitting both sides of the original equation with the transpose \(A'\), to obtain the so called normal equation, \[A' A x = A' b.\]Now we have a linear system \(n\) equations and \(n\) unknowns, which can be solved for \(x\).
There are three methods one can use:
- Direct solution of Normal Equations: The idea is to factorize \(A' A\) using Cholesky decomposition, and solve the resulting system using forward and back substitution. This is straightforward, fast (asymptotic cost \(\sim m n^2 + n^3/3\)), but susceptible to round-off. We will explore this last part in a little bit.
- QR Decomposition: The idea is to first factorize \(A = QR\). Then, we can write the normal equations as \[A' A x = b \implies R' Q' Q R x = R' Q' b.\] Since \(Q\) is orthogonal, we can simplify the equations to the triangular system \[R x = Q' b.\] The asymptotic cost is \(2mn^2 - 2n^3/3\). This method is less susceptible to round-off.
- SVD: We can always bring out the ultimate thermonuclear weapon. We first factorize \(A = U S V'\). The normal equations can then be simplified as above into \[S V' x = U' b.\]The asymptotic cost is \(2mn^2 + 11 n^3\). This method is also less susceptible to round-off. An advantage is that, if required, we can throw in some regularization at a problem to make it less nasty.
Let us explore the relative merits of the three methods by considering a Matlab example.
No comments:
Post a Comment