Consider a linear model with \(N\) observations of the quantity \(Y\), as a function of \( p\) regressors, \(Y = \sum \beta_i X_i\).

\[\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} X_{11} & X_{12} & ... & X_{1p} \\ X_{21} & X_{22} & ... & X_{2p} \\ & & \ddots & \\ X_{N1} & X_{N2} & ... & X_{Np} \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \epsilon,\]

where \(\epsilon\) is normally distributed error. Each row corresponds to an observation, and each column and associated \(\beta\) corresponds to a parameter to be regressed

In general, this can be written as \(Y = X \beta\).

As a simple illustrative case consider fitting the model \(Y = \beta_0 + \beta_1 X\). The above equation becomes: \[\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} 1 & X_1 \\ 1 & X_{2} \\ \vdots & \vdots \\ 1 & X_{N} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}.\]

Given the expressions in the wikipedia article on LLS, we can easily write an Octave program that takes in y and X as above, and spits out the best estimate, the standard error on those estimates, and the set of residuals.

**%**

**% For y = X b + gaussian noise:**

**%**

**compute b,**

**standard error on b,**

**and residuals**

**%**

**function [b se r] = lls(y, X)**

**[Nobs, p] = size(X); % size of data**

**b = (X'*X)\X'*y; % b = estimate beta**

**df = Nobs - p; % degrees of freedom**

**r = y - X * b; % residuals**

**s2 = (r' * r)/df; % SSER**

**varb = s2 * ((X'*X)\eye(p)); % variance of "b"**

**se = sqrt(diag(varb)); % standard errors**

**endfunction**

To test the model I "create" the data y = 1 + 2x + white noise.

**> x = linspace(0,1,10)';**

**> y = 1 + 2 * x + 0.05 * randn(size(x));**

**> X = [ones(t,1) x];**

**> [beta se] = lls(y,X)**

**beta =**

**0.98210**

**2.03113**

**se =**

**0.039329**

**0.066302**

A plot of the data and the regression looks like this:

## No comments:

Post a Comment