Wednesday, May 22, 2013

Ranbaxy and Gupta

Read two negative pieces of India/Indians this past weekend! Long and interesting reads.

1. Rajat Gupta's fall from grace:
The 64-year-old Gupta, who remains free on appeal, has vigorously maintained his innocence. But even as his appeal is heard this week, the fundamental question behind his case remains a mystery. Why would one of the most revered C.E.O.’s of his generation, who retired with a fortune worth some $100 million, show such bad judgment?
2. The unsettling fraud at Ranbaxy: Parts of this story made the hair on my neck stand up!
Thakur left Kumar's office stunned. He returned home that evening to find his 3-year-old son playing on the front lawn. The previous year in India, the boy had developed a serious ear infection. A pediatrician prescribed Ranbaxy's version of amoxiclav, a powerful antibiotic. For three scary days, his son's 102° fever persisted, despite the medicine. Finally, the pediatrician changed the prescription to the brand-name antibiotic made by GlaxoSmithKline (GSK). Within a day, his fever disappeared. Thakur hadn't thought about it much before. Now he took the boy in his arms and resolved not to give his family any more Ranbaxy drugs until he knew the truth.

Monday, May 20, 2013

Distribution of Birthdays and Student Performance

I accidentally happened to chance upon the part of Malcolm Gladwell's book "Outliers", where he talks about how the selection system for junior ice-hockey leagues in Canada strongly favors older kids in a particular cohort, and how that effect lingers for a while.

Here's Gladwell describing the idea in an ESPN interview.
It's a beautiful example of a self-fulfilling prophecy. In Canada, the eligibility cutoff for age-class hockey programs is Jan. 1. Canada also takes hockey really seriously, so coaches start streaming the best hockey players into elite programs, where they practice more and play more games and get better coaching, as early as 8 or 9. But who tends to be the "best" player at age 8 or 8? The oldest, of course -- the kids born nearest the cut-off date, who can be as much as almost a year older than kids born at the other end of the cut-off date. When you are 8 years old, 10 or 11 extra months of maturity means a lot.
The data on that site, and many others such as this one, seem to bear it out. Of course, reality may be more complicated but there does seem to be a germ of truth in the Gladwell's simplified narrative.

Here (pdf) is another study looking at reading skills as a function of birthday distribution, where a similar effect is found.

Why should you care?

Suppose, like me, you have a child whose birthday falls in that weird window (due to arbitrary school cut-off ages), where he or she can either be the oldest or youngest person in her class (depending on your decision to wait an additional year or enroll right away).

I haven't converged on an answer yet, but I know years from now, regardless of what I do, it will all be my fault!
 

Saturday, May 11, 2013

LLS: What technique should I use: Part 3

This is part 3 of a series. The first two posts are here and here.

I am going to use the same example as last time.

A      = vander(0:0.05:1,4);
[m,n]  = size(A);
b      = A * ones(n,1) + 0.01 * randn(m,1);
x_true = A\b
x_true =

   1.04597
   0.97136
   0.97724
   1.00963

I got a slightly different answer, because the noise term is random. Still the solution is close to what we expect (all ones).

Now I am going to severely restrict the number of significant digits to 2. All matrices and vectors will be rounded off to this level of precision. Thus, the direct solution using normal equations is given by:


A    = roundsd(A,nsig); 
b    = roundsd(b,nsig);
L    = chol(A'*A,'lower');
L    = roundsd(L,nsig);
x_ne = (L*L')\(A'*b)

x_ne =

   27.41699
  -36.71613
   14.55408
    0.30160

Whoa! Something bad happened!

If I do the same thing (round of all matrices to 2 significant digits) with QR and SVD, I get, respectively:

x_qr =

   1.23084
   0.69962
   1.05513
   1.00643

x_svd =

   1.38864
   0.60425
   0.96928
   1.02319

The normal equations do terrible, while QR and SVD seem to do reasonably okay. The reason for this is straightforward. The condition number of \(A\) was about 100 (which is the same as the condition number of its transpose), which makes the condition number of \(A'A\) equal to 10000. Since we expect to lose about 4 digits due to the ill-conditioning (worst-case estimate), restricting ourselves to 2 significant digits screws everything up.

On the other hand, the condition number of R is the same as that of A. The QR implementation has no \(R'R\) term; the condition number does not get squared! Hence, we expect to lose at most 2 digits (conservative estimate, usually much less), and QR does not suffer as much.

Between QR and SVD, the QR solution tends to be better (as measured by the distance from the true solution). But we can "regularize" the SVD solution if we want by discarding parts corresponding to smaller singular values.


Wednesday, May 8, 2013

Assessing Assessors

An interesting piece in the Chronicle (via Rationally Speaking) on "Who's assessing the assessors' assessors?"

How do we know that outcomes assessment tests measure accurately something over and beyond the instructors grades?
We don't know that the outcomes-assessment tool reliably indicates student achievement. We can't merely assume without reason that it measures learning outcomes, and, by the same reasoning that justified outcomes assessment to start with, we need some other means of assessment to determine student success or failure. Once we use that new tool, then we can see how accurate outcomes assessment was. Let's call this new procedure outcomes-assessment assessment.
And onwards to infinite regress.

Interesting "practical" counterpoint in the comments (Manyul Im):
The article suffers from a mischaracterization of the motivation for assessment. It's not radical doubt about the role or effectiveness of grading as a measuring tool for learning outcomes that motivates assessment. It's just the desire to provide a second-level check on the effectiveness of such tools. That's a fair institutional structure to catch, prevent, correct, or improve ongoing systems of measurement. Think of it as an extra quality check mechanism. There's nothing epistemologically suspect about checking quality twice. And that doesn't lead us down some path of infinite quality checks. For example, I don't need a spellcheck for the spellcheck for the spellcheck, etc. I just need spellcheck sometimes.
Interesting conversation.

Monday, May 6, 2013

Links:

1. The Mind of a Con Man: The unraveling of yet another web of lies.

2. Math with Bad Drawings: Blog with lots of whiteboard pictures (via FlowingData)

Wednesday, May 1, 2013

LLS: What technique should I use? (part 2)

I previously outlined three different numerical approaches to solve the least-squares problem \[A' A x = A' b.\]In this part, I am going to set up the problem, and outline the solution method/code. We will delay a detailed discussion of the susceptibility to round-off until the next part in this series.

I am going to choose the matrix A to be a 21 (m) by 4 (n) Vandermonde matrix,
\[A=\begin{bmatrix}

1 & p_1 & p_1^2 & \dots & p_1^3\\
1 & p_2 & p_2^2 & \dots & p_2^3\\
\vdots & \vdots & \vdots & \ddots &\vdots \\
1 & p_{21} & p_{21}^2 & \dots & p_{21}^3
\end{bmatrix},\]where \(p_i = 0.05 (i-1)\) are evenly spaced grid-points between 0 and 1. This is a discrete polynomial approximation or regression problem, although what it is, does not really matter in this case study.

To keep things under control, I am going to make up the vector \(b\) by assuming for a brief moment that \(x\) is a vector of all ones, and add some white noise. In Octave, I do all this via:

A     = vander(0:0.05:1,4);
[m,n] = size(A);
b     = A * ones(n,1) + 0.01 * randn(m,1);

Once I have a noisy \(b\), I am going to pretend I don't know what \(x\) I used, and try to regress it again using the normal equations. Again, in Octave this can be done directly with the backslash operator:

x_true = A\b

x_true =

   0.92492
   1.12503
   0.94494
   1.00460


It turns out that if I solve the problem in double precision, then all the methods yield the same (correct) answer, since the condition number of A (approx. 110) is not terribly bad.

Remember round-off is purely a manifestation of finite precision, and one can always delay its onset by increasing the precision or word size.

But this hides the underlying disease, which can be fatal if the matrix is poorly conditioned.

Instead of making the matrix A progressively pathological, I will take the other tack - I will limit the number of significant digits with which we operate.

To see the suceptibility of the solution method on round off, we need some way of controlling the precision with which numbers are represented. This routine (roundsd) from Matlab File Exchange allows us to do just that. We can specify number of significant digits with which to represent any number, vector or matrix.

Thus, if I set the number of significant digits to a very high 16 (nearly double precision), and carry out direct solution, QR, or SVD, I get the same answer:

nsig = 16;
A    = roundsd(A,nsig);
b    = roundsd(b,nsig);

1. Direct solution via normal equations:

L    = chol(A'*A,'lower');
L    = roundsd(L,nsig);
x_ne = (L*L')\(A'*b)

x_ne =

   0.92492
   1.12503
   0.94494
   1.00460

2. QR Decomposition: This also gives the same answer (x_qr = x_ne).

[Q R] = qr(A);
Q     = roundsd(Q,nsig);
R     = roundsd(R,nsig);
x_qr  = R\(Q'*b)


x_qr =

   0.92492
   1.12503
   0.94494
   1.00460

3. SVD: This also gives the same answer (x_svd = x_ne). I have tried to save some multiplications by using only the non-zero singular values.

[U S V] = svd(A);

U = roundsd(U,nsig);
S = roundsd(S,nsig);
V = roundsd(V,nsig);
r = rank(A);

V = V(:,1:r);
S = S(1:r,1:r);
U = U(:,1:r);

x_svd = (S*V')\(U'*b)


x_svd =

   0.92492
   1.12503
   0.94494
   1.00460

In the next part, let us analyze the susceptibility to round-off.