Tuesday, July 31, 2012

Jason Alexander on Gun Control

If you haven't already read this post by "George Costanza", you absolutely should. He makes a superb nuanced case for additional gun control (of assault rifle type firearms) by looking at some common and some not-so-common arguments. He does not call for regulating all guns, only assault type firearms.

To summarize/paraphrase the rebuttal:

Argument 1: The right to bear arms is in the constitution
Rebuttal: If you want to be literal, the second amendment to the constitution confers the right to bear arms only to well regulated militia.

Argument 2: Forget literal! What about the spirit of the constitution?
Rebuttal: All rights have boundaries. The right to free speech ends before you can shout "fire" in a stadium or maliciously defame someone. Clearly, the right to bear arms does not extend all the way possessing anti-aircraft missiles, tanks or chemical weapons.

Argument 3: Guns don't kill, people do. Should you ban all baseball bats because X bludgeoned Y to death with one?
Rebuttal: Baseball bats have other legitimate uses. Assault rifles have no legitimate roles outside of battle zones that are not satisfied with less lethal weapons.

Argument 4: If everyone had a concealed weapon, these psychotic killers could be stopped before they did much damage.
Rebuttal: You mean in a crowded, chaotic environment, with the perpetrator wearing a bulletproof vest? Really?

Argument 5: Regulation wouldn't help. The bad guys would get the bad stuff anyway.
Rebuttal: It would at least deter some psychotics from walking to the nearest KMart to get one. Also, see #2: we already regulate/ban some types of particularly harmful weapons.

Tuesday, July 24, 2012

Too Big to Fail and The Big Short

If I ever had to make a list of the top five world events of my lifetime, the great recession (TGR) of 2008 will probably feature prominently (I hope. I don't think I want to live in "very interesting times").

I recently read two books on TGR: Andrew Ross Sorkin's "Too Big to Fail", and Michael Lewis' "The Big Short". Both books are eminently readable.

Sorkin's book deals with the events leading up to the fall of Lehman Brothers, and reads like a movie screenplay. The "dialogues" of the principal actors in the drama are written in first person. I don't know how, and how accurately, Sorkin managed to do that, but it does make for compelling storytelling.

You get a sense for how chaotic those times were, and how the principals involved had to make important decisions under extreme uncertainty and pressure. And how easy it is for many commentators on the crisis to be Monday night quarterbacks.

The book provides interesting  color on people who have subsequently come to be viewed in the media somewhat uni-dimensionally. For example, you learn how Lehman CEO Dick Fuld, stood up for the weak in an ROTC camp in his college days, before coming to be unanimously reviled as an out-of-touch, and perhaps, criminal operator. You learn how unaware of social niceties former Treasury Secretary Hank Paulson was. I never knew that this Republican, former Goldman Sachs CEO was a Toyota-Prius-driving birdwatcher and environmentalist.

I would strongly recommend Sorkin's book for the scene-by-scene portrayal of some very tumultuous  times, and for the fullness with which it casts some of the most reviled people in America today.

If Too Big to Fail is a view from the inside, The Big Short is a view from the outside.

Michael Lewis' book outlines the stories of a few unlikely characters who foresaw the financial massacre a few years earlier, and smartly bet against it. It follows the trail of social misfits like Michael Burry, a former medical doctor-turned-hedge fund manager, who was among the first to figure it all out, only to be hounded by investors during trying times, and Steve Eisman who managed a fund for Morgan Stanley and lamented that he couldn't short his parent company.

Even if these people knew the whole thing was going to blow up, they did not know when. And even if they bought insurance to bet on the outcome they thought was most likely, they could not be sure that when the house was on fire, the insurer wouldn't go bankrupt. As Warren Buffett put it succinctly “It's not just who you sleep with, it's also who they are sleeping with,”

Here's a commencement speech by Michael Burry, and another one by Michael Lewis.

Saturday, July 21, 2012

A Twist in the Prisoner's Dilemma

Prisoner's dilemma is a famous model in game theory, which in a basic form can be asserted as:
Two men are arrested, but the police do not possess enough information for a conviction. Following the separation of the two men, the police offer both a similar deal—if one testifies against his partner (defects/betrays), and the other remains silent (cooperates/assists), the betrayer goes free and the one that remains silent receives the full one-year sentence. If both remain silent, both are sentenced to only one month in jail for a minor charge. If each 'rats out' the other, each receives a three-month sentence. Each prisoner must choose either to betray or remain silent; the decision of each is kept quiet. What should they do?
Despite its apparent simplicity, it is used as a model in places you probably wouldn't imagine it to be.

The solution to this one-off game is quite simple (if depressing). You should not cooperate.

A more interesting version is the iterated prisoners dilemma, where the game is played over and over again. For the longest time (since I took a course in game theory about 10 years ago), it was always assumed that, empirically, a simple strategy like "tit-for-tat" offered a decent balance between simplicity and effectiveness.

Turns out that there is a new twist in the plot.

William Press (of Numerical Recipes fame) and Freeman Dyson ("the") recently published a paper in PNAS (open access) that seems to be getting a lot of attention.

There is a nice commentary (pdf) that is easy to follow for those of us, who have an undergrad-level understanding of the problem.

Wednesday, July 18, 2012

Splitting a large text file by number of lines and tags

Say you have a big file (text, picture, movie, etc) and you want to split it into many small parts. You may want to do this to email it to someone in more manageable chunks, or perhaps analyze it using a program that cannot handle all of the data at once.

The Linux command split lets you chop your file into chunks of specified size. To break a big file called "BigFile.mpg" into multiple smaller chunks "chunkaa", "chunkab" etc. you say somthing like.

split -b 10M BigFile.mpg chunk

Consider a simpler case, where the big file is a text file. For concreteness assume that BigFile.txt looks like:

# t = 0
particle1-coordinates
particle2-coordinates
...
particleN-coordinates

# t = 1
particle1-coordinates
particle2-coordinates
...
particleN-coordinates
...
# t = tfinal
particle1-coordinates
particle2-coordinates
...
particleN-coordinates

You may generate one of these, if you are running a particle-based simulation like MD, and printing out the coordinates of N particles in your systems periodically. For concreteness say N = 1000, and tfinal = 500.

If this file were too big, and you wanted to split it up into multiple files (one file for each time snapshot) then you could still use the split command as follows

split -l 1002 BigFile.txt chunks

The 1002 includes the two additional lines: the time stamp and the blank line after the time snapshot.

You can also use awk instead, and use the fact that the "#" tag demarcates records

awk 'BEGIN {c=0} /#/{next; c++} {print > c ".dat"}' BigFile.txt

would do something very similar. It would match the "#" tag and create files 0.dat etc. containing the different time-stamps. The advantage of this method is that you have more flexibility in naming your chopped pieces, and you don't have to know the value of "N" before-hand.

Finally, say you wanted to create chopped pieces in a different way. Instead of chopping up timestamps, you wanted to store the trajectories of individual particles in separate files. So while the methods above would have created 500 files with 1000 (+2) lines each, you now want to create 1000 files with 500 lines. One of the easiest ways is to use sed.

sed -n 1~10p  prints every tenth line starting with the "1"st line. You can use this to write a simple shell script.

npart=1000;
ndiff=$((npart + 2))
n=1;
while [ $n -le $npart ]
do
  nstart=$((n+1))
  sed -n $nstart ~ $ndiff'p' rcm > $n.dat
  n=$((n + 1))
done

Note the single quotes around "p" in the line containing the sed command.

Thursday, July 12, 2012

Linear Least Squares in GNU Octave: Part Deux

I recently blogged about doing LLS in GNU Octave, getting not only the best fit parameters, but also the standard error associated with those estimates (assuming the error in the data is normally distributed).

Octave as an built-in function to do ordinary least squares: ols. While it does not directly report the standard errors associated with the regressed parameters, it is relatively straightforward to get them from the variables returned.

Here is how you would solve the same example as before:

Nobs = 10;
x = linspace(0,1,Nobs)';
y = 1 + 2 * x + 0.05 * randn(size(x));
X = [ones(Nobs,1) x];

[beta sigma] = ols(y,X)
yest         = X * beta;

p    = length(beta);             % #parameters
varb = sigma * ((X'*X)\eye(p));  % variance of "b"
se   = sqrt(diag(varb))          % standard errors


Monday, July 9, 2012

EconTalk Podcasts and More

I stumbled upon EconTalk podcasts earlier this year, and have been hooked. I find myself listening to these fascinating long-form (~1 hour) discussions on various "economic" topics with intellectual leaders while exercising, driving, doing dishes etc.

The whole discussion, while being in a question-answer format, is not really an "interview" in the Charlie Rose sense. The focus is more on the topic, and less on the person.

Here is the description of the program from the website:
The Library of Economics and Liberty carries a weekly podcast, EconTalk, hosted by Russ Roberts. The talk show features one-on-one discussions with an eclectic mix of authors, professors, Nobel Laureates, entrepreneurs, leaders of charities and businesses, and people on the street. The emphases are on using topical books and the news to illustrate economic principles. Exploring how economics emerges in practice is a primary theme. 
The quality of the discussions in the forum is quite extraordinary.

Thursday, July 5, 2012

Milton Friedman videos on YouTube

Milton Friedman may be a controversial economist, but his videos on YouTube reflect why he was such an intellectual juggernaut. Here are a few that I used to while away a perfectly enjoyable afternoon.

1. No free lunch:

2. Young Michael Moore challenging Friedman: He was so much thinner then. (Edit: Apparently not the real Michael Moore)

3. An older black and white video:
etc.

Tuesday, July 3, 2012

Linear Least Squares in GNU Octave with Standard Errors

Consider a linear model with \(N\) observations of the quantity \(Y\), as a function of \( p\) regressors, \(Y = \sum \beta_i X_i\).

\[\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} X_{11} & X_{12}  &  ... & X_{1p} \\ X_{21} & X_{22}  & ... & X_{2p} \\  & & \ddots & \\ X_{N1} & X_{N2}  & ... & X_{Np} \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \epsilon,\]
where \(\epsilon\) is normally distributed error. Each row corresponds to an observation, and each column and associated \(\beta\) corresponds to a parameter to be regressed

In general, this can be written as \(Y = X \beta\).

As a simple illustrative case consider fitting the model \(Y = \beta_0 + \beta_1 X\). The above equation becomes: \[\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} 1 & X_1 \\ 1 & X_{2}  \\ \vdots & \vdots \\   1 & X_{N} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}.\]

Given the expressions in the wikipedia article on LLS, we can easily write an Octave program that takes in y and X as above, and spits out the best estimate, the standard error on those estimates, and the set of residuals.

%
% For y = X b + gaussian noise:
compute b, standard error on b, and residuals
%

function [b se r] = lls(y, X)
        
    [Nobs, p] = size(X);          % size of data

    b   = (X'*X)\X'*y;            % b = estimate beta
    df  = Nobs - p;               % degrees of freedom

    r   = y - X * b;              % residuals
    s2  = (r' * r)/df;            % SSER
    varb = s2 * ((X'*X)\eye(p));  % variance of "b"
    se   = sqrt(diag(varb));      % standard errors

endfunction

To test the model I "create" the data y = 1 + 2x + white noise.

> x = linspace(0,1,10)';
> y = 1 + 2 * x + 0.05 * randn(size(x));
> X = [ones(t,1) x];
> [beta se] = lls(y,X)

beta =

   0.98210
   2.03113

se =

   0.039329
   0.066302

A plot of the data and the regression looks like this: