## Tuesday, July 3, 2012

### Linear Least Squares in GNU Octave with Standard Errors

Consider a linear model with $N$ observations of the quantity $Y$, as a function of $p$ regressors, $Y = \sum \beta_i X_i$.

$\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} X_{11} & X_{12} & ... & X_{1p} \\ X_{21} & X_{22} & ... & X_{2p} \\ & & \ddots & \\ X_{N1} & X_{N2} & ... & X_{Np} \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \epsilon,$
where $\epsilon$ is normally distributed error. Each row corresponds to an observation, and each column and associated $\beta$ corresponds to a parameter to be regressed

In general, this can be written as $Y = X \beta$.

As a simple illustrative case consider fitting the model $Y = \beta_0 + \beta_1 X$. The above equation becomes: $\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} 1 & X_1 \\ 1 & X_{2} \\ \vdots & \vdots \\ 1 & X_{N} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}.$

Given the expressions in the wikipedia article on LLS, we can easily write an Octave program that takes in y and X as above, and spits out the best estimate, the standard error on those estimates, and the set of residuals.

%
% For y = X b + gaussian noise:
compute b, standard error on b, and residuals
%

function [b se r] = lls(y, X)

[Nobs, p] = size(X);          % size of data

b   = (X'*X)\X'*y;            % b = estimate beta
df  = Nobs - p;               % degrees of freedom

r   = y - X * b;              % residuals
s2  = (r' * r)/df;            % SSER
varb = s2 * ((X'*X)\eye(p));  % variance of "b"
se   = sqrt(diag(varb));      % standard errors

endfunction

To test the model I "create" the data y = 1 + 2x + white noise.

> x = linspace(0,1,10)';
> y = 1 + 2 * x + 0.05 * randn(size(x));
> X = [ones(t,1) x];
> [beta se] = lls(y,X)

beta =

0.98210
2.03113

se =

0.039329
0.066302

A plot of the data and the regression looks like this: