Clueless Fundatma: 2012

Monday, December 24, 2012

Taleb's Fragile Ego and Crusty Bed

I read Nicholas Nassim Taleb's bestseller "The Black Swan" nearly two years ago. You can see what I thought of that book here (some related thoughts here). Let's be charitable, and say I do not have a favorable impression of the book or the guy.

Call it confirmation bias, but I found Tom Barlett's "non-profile" of NNT in the wake of his new book "Antifragile" far more amusing.

Actually, Antifragile feels like a compendium of people and things Taleb doesn't like. He is, for instance, annoyed by editors who "overedit," when what they should really do is hunt for typos; unctuous, fawning travel assistants; "bourgeois bohemian bonus earners"; meetings of any kind; appointments of any kind; doctors; Paul Krugman; Thomas Friedman; nerds; bureaucrats; air conditioning; television; soccer moms; smooth surfaces; Harvard Business School; business schools in general; bankers at the Federal Reserve; bankers in general; economists; sissies; fakes; "bureaucrato-journalistic" talk; Robert Rubin; Google News; marketing; neckties; "the inexorable disloyalty of Mother Nature"; regular shoes.

While reading it, I also found an equally funny but older review of his previous book of mundane aphorisms called the Bed of Crusty.

Check them out.

Thursday, December 20, 2012

TeX: The Enumerate Package

The enumerate package provides a simple intuitive interface to control the appearance of enumeration counters.

You can adorn the usual enumerate environment with one of A a I i or 1 to produce capital alphabet, small alphabet, capital roman, small roman, or numeric counters. You can also throw in some extra decorations such as "(a)", "a)", "N 1." etc.

From the documentation:

These letters may be surrounded by any strings involving any other TeX expressions, however the tokens A a I i 1 must be inside a { } group if they are not to be taken as special.

Here is an example from the documentation:

\begin{enumerate}[EX i.] 

\item one one one one one one one

      one one one one\label{LA} 

\item two 
    \begin{enumerate}[{example} a)] 
    \item one of two one of two 

         
one of two\label{LB} 
    \item two of two 
    \end{enumerate} 

\end{enumerate}

\begin{enumerate}[{A}-1]
\item one\label{LC}
\item two
\end{enumerate}

Sunday, December 16, 2012

Alcohol, Proof, and Flammability

As this article explains, the concept of alcohol proof is simple: "Double the number listed as the alcohol by volume on the bottle. A spirit with 40 percent alcohol by volume, therefore, is 80 proof."

Thus, if you know the alcohol content on a volume basis, the "proof number" is really redundant.

The historical origins are interesting:

In the 18th century, proof was much more straightforward. Liquor was "proofed" at the distillery by adding gunpowder and lighting it on fire. If it didn't light, the alcohol content was too weak. If it burned yellow, too strong. If it burned blue, the proof was just right (that was around 57 percent, or 114 proof).

The flash point (the temperature at which a volatile material can form combustible vapors) of a 60% alcohol-water mixture is 22 C - which perhaps explains its historical antecedents. It also explains why some high proof spirits have flame retardants.

Friday, December 7, 2012

Links

1. Intel Xeon Phi: Will it upend the GPGPU market? A very bullish (read: overly optimistic, perhaps?) take "What will Intel Xeon Phi do to the GPGPU market?". Any way you dice it, this is exciting news for HPC developers and users.

2. GeoGebra: According to wikipedia, GeoGebra is "an interactive geometry, algebra, and calculus application, intended for teachers and students. Most parts of GeoGebra are free software." I gave it a spin recently, and thought it was a great tool for school kids to deal with the abstractness of math (until they begin to fall in love with it).

3. As someone who is acutely sensitive to "wait times" at grocery store lines and traffic lights, I read the article, "The Ups and Downs of Making Elevators Go", in the WSJ with interest.

Wednesday, December 5, 2012

World's got Talent: Devlin on MOOCs

I read a provocative piece called "The Darwinization of Higher Education" by Keith Devlin, in which he makes a persuasive case for the talent sniffing abilities of MOOCs (massively open online courses).

He contends that the putative goal of such courses (mass education) has less to do with their real benefits:

Forget all those MOOC images of streaming videos of canned lectures, coupled with multiple-choice quizzes. Those are just part of the technology platform. In of themselves, they are not revolutionizing higher education. We have, after all, had distance education in one form or another for over half a century, and online education since the Internet began in earnest over twenty-five years ago. But that familiar landscape corresponds only to the last two letters in MOOC ("online course"). The source of the tsunami lies in those first two letters, which stand for "massively open."

Rather than focus on 90% of the students who drop out, identifying the few dozen who survive and flourish gives one an easy way to scour for talent. Quoting (emphasis mine)

At the level of the individual student, MOOCs are, quite frankly, not that great, and not at all as good as a traditional university education. This is reflected (in part) in those huge dropout rates and the low level of performance of the majority that stick it out. But in every MOOC, a relatively small percentage of students manage to make the course work to their advantage, and do well. And when that initial letter M refers not to tens of thousands but to "millions," those successes become a lot of talented individuals.

One crucial talent in particular that successful MOOC students possess is being highly self-motivated and persistent. Right now, innate talent, self-motivation, and persistence are not enough to guarantee an individual success, if she or he does not live in the right part of the word or have access to the right resources. But with MOOCs, anyone with access to a broadband connection gets an entry ticket. The playing field may still not be level, but it's suddenly a whole lot more level than before. Level enough, in fact.

I've already seen numerous anecdotal variants of this model of talent identification work in the realm of open-source software. In fact, the employer of one of my grad-school room-mates found him via his presence and contribution to the Linux ecosystem.

Monday, December 3, 2012

Howto: Convert EPS from Grace to PDF (for pdfLaTeX perhaps?)

I am a big fan of the 2D plotting program Grace, as attested by some of my previous posts.

It lets you create high-quality graphs, which can be exported to a variety of formats. One conspicuous export format missing from the list is PDF.

Even if you use pdfLaTeX for example, this is only a minor issue - worth a grumble, but probably not a rant, since you know that Linux has a whole slew of methods you can throw at the problem.

Like ps2pdf.

Let's work through an example. Say you have an EPS image (fig.eps) that looks like so:

You say ps2pdf fig.eps and get fig.pdf. You open it up and it looks like, yikes!!!

The bounding box has been obliterated! And no, the -dEPSCrop flag does you no good!

You open up fig.eps in a text editor, and the first two lines you see ("at end") tell you all that is going on.

%!PS-Adobe-3.0 EPSF-3.0
%%BoundingBox: (atend)

The specs of the actual bounding box are hidden at the way end of the EPS file.

%%BoundingBox: 29 51 716 529

There are two ways to fix this. If this is just a one off file, then you can replace the second line (with the "atend") with the actual line containing the definition of the bounding box. Save the figure (say as fig1.eps) and say:

ps2pdf -dEPSCrop fig1.eps

And all your troubles are gone.

Another method which does the same thing, but may be more useful if you have to do this to 100 files is as follows:

1. Use the ps2eps tool to "convert" the image to EPS (this automatically bring the BoundingBox to the top)

ps2eps fig.eps fig1.eps

Note: This may result in a huge fig1.eps, but if your intention is to get a PDF you can delete fig1.eps soon enough.

2. Use ps2pdf tool

ps2pdf -dEPSCrop fig1.eps

And you get a nice cropped PDF image.

Friday, November 30, 2012

Making "journal quality" graphs in Matlab

A very nice blog on making journal quality graphs by extensively using handles at Loren and the Art of Matlab.

As a side note, the actual blog entry was written by guest blogger Jiro Doke, who I went to school with, as mentioned earlier.

Monday, November 26, 2012

Rainbows are beautiful!

After seeing this lecture, I will never see a rainbow with the same eyes:

Wednesday, November 21, 2012

Kling on Online Education

I recently listened to a podcast of an engaging interview with Arnold Kling at EconTalk. The interview starts off from an article Kling wrote earlier in the American. He starts off the article provocatively:

Education is in some respects one of the most stagnant of all major industries. A farmer from 150 years ago would not comprehend a modern farm. A factory worker from 150 years ago would not be able to function in a modern factory. But a professor from 150 years ago could walk into a classroom today and go to work without missing a beat.

At this point you are probably thinking, "this guy sounds like yet another of those guys who thinks online education and the private sector are going to supplant traditional universities." Perhaps, but not quite.

In the article, Kling then argues why MOOCs (massive open online courses) are mostly just hype. Most (about 90%) of the "tens of thousands" of students who take them, give up very early.

We should not be surprised that MOOCs do not benefit most of those who try them. Students differ in their cognitive abilities and learning styles. Even within a relatively homogenous school, you will see students put into separate tracks. If we do not teach the same course to students in a single high school, why would we expect one teaching style to fit all in an unsorted population of tens of thousands?

An online course that has been designed at Stanford is likely to best fit the students who are suited to that particular university. The other beneficiaries are likely to be students who have the right cognitive skills and learning style but happen to be unable to attend college in the United States.

And perhaps a key insight:

The attempt to achieve large scale in college courses is misguided. Instead of trying to come up with a way to extend the same course to tens of thousands of students, educators should be asking the opposite question: How would I teach if I only had one student? Educators with just one student in their class would not teach by lecturing.

Interesting perspective - even if you don't agree with all of it.

Thursday, November 15, 2012

Exporting Matrices in Octave/Matlab to LaTeX format

I often need to include vectors or matrices computed during GNU Octave sessions in my lectures or presentations. Here is a quick program called matrixTeX.m (stored on Google Drive) that takes in a matrix A and spits out the matrix form in LaTeX.

In the simplest form, matrixTeX(A), it simply takes in a matrix A, checks whether the elements are integers or floats, and prints out to the screen in the appropriate format using the amsmath matrix environment bmatrix.

If you'd like the elements of the matrix displayed in a particular format, you can optionally use a second input argument to specify a C-style formatting string: e.g: matrixTeX(A, '%10.4e'), or matrixTeX(A, '%d').
In addition, you can also specify a third argument to specify alignment of columns as or matrixTeX(A, '%d','r'), where the 'r' stands for right alignment. This option uses the bmatrix* environment provided by the mathtools package, which needs to be included in the LaTeX preamble.

Thursday, November 8, 2012

Is consistency overrated?

Jeff Bezos thinks so:

... he shared an enlightened observation about people who are “right a lot”.

He said people who were right a lot of the time were people who often changed their minds. He doesn’t think consistency of thought is a particularly positive trait. It’s perfectly healthy — encouraged, even — to have an idea tomorrow that contradicted your idea today.

He’s observed that the smartest people are constantly revising their understanding, reconsidering a problem they thought they’d already solved. They’re open to new points of view, new information, new ideas, contradictions, and challenges to their own way of thinking.

It reminds me of Keynes': "When the facts change, I change my mind. What do you do sir?"

Tuesday, October 30, 2012

Small Links

1. Incredibly Small: Wired has some nice microscopy images

2. The strange world of nanoscience

Thursday, October 25, 2012

Creative Household Tips

Regardless of whether the frugality or creativity (or both) appeal to you, the "illustrated" household tips here are worth a read. Some of my favorites include:

A frozen, saturated sponge, in a zip-lock makes a drip-free ice-pack
Use clothespin to hold nails while hammering to save your fingers
Use a spring from an old pen to keep charger cord (especially on Macs) from breaking
On camping trips, use Doritos instead of starter fluid to kindle fires
Wrap beer bottles with a wet paper towel to chill them really fast
Use nailpolish to discriminate between similar looking keys

Thursday, October 18, 2012

Nice Silly Girl!

A Way with Words is an informative and entertaining program/podcast, that explores word and language curiosities. Here is what I learned yesterday:

The word silly didn’t always have its modern meaning. In the 1400s, silly meant happy or blessed. Eventually, "silly" came to mean weak or in need of protection. Other seemingly simple words have shifted meanings as the English language developed: the term girl used to denote either a boy or a girl, and the word nice at one time meant ignorant.

Tuesday, October 16, 2012

Pink is for Boys, Blue is for Girls

I was listening to a weekly podcast of the "The Reality Check" by the Ottawa Skeptics, and they had this piece about the history of "pink for girls and blue for boys".

This assignment of colors, which is rather arbitrary, is particularly strong in the United States. Quite interestingly, the associated history is quite fascinating as discussed in the show (and this article at the Smithsonian Magazine).

Apparently before 1900, white was the color of choice for kids of both sexes almost until first World War - partly because colored clothes were more expensive, and partly because whites could be bleached clean (and partly because of social norms).

Here's where things get interesting.

For example, a June 1918 article from the trade publication Earnshaw's Infants' Department said, “The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink, being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl.” Other sources said blue was flattering for blonds, pink for brunettes; or blue was for blue-eyed babies, pink for brown-eyed babies, according to Paoletti.

In 1927, Time magazine printed a chart showing sex-appropriate colors for girls and boys according to leading U.S. stores. In Boston, Filene’s told parents to dress boys in pink. So did Best & Co. in New York City, Halle’s in Cleveland and Marshall Field in Chicago.

It wasn't until much later that the current practice became the norm. Neither the article nor the show delves into the factors responsible for the reversal. Given how hard it is to change arbitrary norms/formats (QWERTY keyboard, driving on the left/right side) that have gained some currency, this is definitely something interesting.

Wednesday, October 10, 2012

Why is the sky blue during the day and dark at night?

Two nice physics videos that explain a part of the question above.

Walter Levin does two magnificient demonstrations: in the first he shows why the period of a pendulum is independent of amplitude and mass (this part is cool!). In the second demonstration, he explains beautifully how Rayleigh and Mie scattering conspire to make the sky blue, clouds white, and sunsets red.

The MinutePhysics channel on YouTube also has a lot of snappy videos. Here is one that tries to explain why the sky is dark at night.

Friday, September 28, 2012

Deresiewicz on "Scientism"

Deresiewicz laments the notion that "science is the only valid form of knowledge".

[editors] want numbers; studies, sociology. Aristotle, Montaigne, and Emerson are not valid authorities on the topic, say, of friendship, but a study of 50 college students is enough to convince an editor of anything.

There definitely is something that rings true in his argument.

However, I find the example of equating experience and data as equally valid means to arrive at the conclusion "city life is stressful" somewhat problematic.

For example, many people feel that violent crime has increased, while "scientific" data seem to show otherwise (see this TED talk by Steve Pinker on his book on the topic).

When experience and data diverge, data - trust the data!

Wednesday, September 26, 2012

Setting up the random number generator seed in Matlab and Octave

Mike Croucher as an interesting post on correctly and incorrectly setting up the seed in Matlab.

The key takeaway is that one should not use rand('seed',x) or rand('state',x) to reset the random number seed, since Matlab defaults to an inferior generator used in older versions. It is preferable to use rng(x) instead.

GNU Octave on the other hand has a less quirky take on the whole thing.

Using rand('seed',x) defaults to older generators, while rand('state',x) resets the seed, but still uses a version of Mersenne Twister.

Monday, September 17, 2012

How to: Crop PDF Images

If you use Matlab (or some other program) to make a plot and then export the figure to a PDF (for inclusion in a TeX document for example), you will end up with plenty of whitespace around the image as below.

It is easy to find out the "bounding box" using ghostscript as:

gs -sDEVICE=bbox -dNOPAUSE -dBATCH file.pdf

which after a bunch of messages yields something like:

%%BoundingBox: 89 231 491 545
%%HiResBoundingBox: 89.766138 231.192063 490.841915 544.769913

You can use those dimensions, and use the TeX command

\includegraphics[bb = 89 231 491 545 ]{file.pdf}

to trim the image within TeX.

Alternatively, you can also use ghostscript to crop the image itself by using:

gs -sDEVICE=pdfwrite \

-o trimmedFile.pdf \

-c "[/CropBox [89 231 491 545] /PAGES pdfmark" \

-f file.pdf

where the output file trimmedFile.pdf is generated. You then don't have to worry about the bounding box inside the TeX document.

Another route is to use the perl script pdfcrop.

pdfcrop file.pdf

produces the appropriately cropped file file-crop.pdf

Sunday, September 9, 2012

Cleve Moler: Matlab, textbooks and blog

As many of you who use Matlab might know, Cleve Moler was a professor of math and computer science at a bunch of universities (including Michigan), before he co-founded Mathworks.

I should have known this before, given how much Matlab and GNU Octave I use, but only recently did I find out about two free textbooks that he wrote. They are available (as pdfs - either chapterwise or in full) on the Mathworks website. The first called "Numerical Computing with MATLAB" is an engaging trek through the usual topics of numerical analysis with Matlab. It is written in an extremely accessible style, and highlights the many special insights he has accumulated over the years. The use of Matlab allows him to quickly explore some topics, do some quick computations, plot and visualize, and mine for knowledge and wisdom.

The other book called "Experiments with Matlab" is similar in spirit, except that it is targeted at younger audiences. Rather than using a standard numerical analysis course to lay down the skeleton of the book, it jumps from one puzzle to another, one curiosity to the next looking at the underlying patterns and math.

Both books are very highly recommended, regardless of whether you are a student learning this material for the first time, or an instructor who has taught similar topics several times. There is something in it for everybody.

I should also mention that Cleve writes an engaging blog called "Cleve's Corner" which I now have on my Google Reader.

Tuesday, August 28, 2012

Julia: Sounds Exciting!

I've been hearing about the excitement around Julia language on and off for a while now. It is hard to avoid, given my line of work. Today I took some time to scope out the landscape, and I like what I see!

Here is a quick introduction to why the authors decided to create Julia. Basically, it sets out with a fairly ambitious mandate.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

It has a Matlab like syntax and flexibility, and from benchmarks it seems to approach the performance of C and Fortran.

The language is still in active development, and if it lives up to some of its promise, I can see it being a game-changer in scientific computing.

Monday, August 20, 2012

Aligning Matrix Columns in LaTeX

The standard amsmath packages provide a convenient interface to write matrices in LaTeX. For example, the following code produce the matrix:

\[\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}.\]

Clearly the columns could do with some alignment. Of course, you could do it from scratch by using the array environment, but that would mean giving up the convenience of the AMS matrix environments.

Enter the mathtools package, which is an add-on to the amsmath packages. The mathtools package loads the amsmath packages, if they haven't already been.

Once you've loaded the mathtools package (using the usepackage command), you can specify the alignment of your matrices with a special starred environment as follows:

which yields:

Voila!

Thursday, August 16, 2012

How to Make a Mesh: John Burkardt

My colleague, John, is a gifted teacher and communicator. I have previously linked to the cornucopia of numerical math software that he has obsessively (in his own words) curated.

He recently presented a talk on using publicly available programs to do meshing. It is a non-mathematical, tutorial-like presentation. Even though I haven't had to do any heavy duty meshing in my research thus far, I thoroughly enjoyed his talk.

I'm trying to convince him to videotape the rest of his talks over summer. Let's see what happens.

Monday, August 13, 2012

TeXcount: Number of words in a LaTeX document

Since TeX is really a markup language, counting the number of words in a document is tricky. Obviously, you don't want to literally count words in tags like \chapter{}, \begin{center}, \cite{reference1, reference2} etc.

You may have macros, which need interpretation.

You may have external files that you are collecting together in a master document by using \input{} etc.

In short, it is not as simple as it seems.

You could try to use front-end programs like kile or TeXShop which will give you a simple total count. My front-end program of choice --- TeXMaker --- does not do it for me.

If your document is very simple, you could try to "detex" the LaTeX tags, and use a simple Linux utility like "wc".

The best solution seems to be TeXcount.

There is a web-interface that lets you paste your TeX document in a web-form.

Alternatively you can download the script. It is essentially a small perl program (400kB download in all, the actual script is about 90kB) called texcount.pl, which you can run quite simply as

perl texcount.pl filename.tex

Here's the form (default) output it spits out

Encoding: ascii
Words in text: 10324
Words in headers: 81
Words in float captions: 219
Number of headers: 30
Number of floats: 5
Number of math inlines: 198
Number of math displayed: 18
Subcounts:
text+headers+captions (#headers/#floats/#inlines/#displayed)
14+9+0 (1/0/0/0) _top_
89+1+0 (1/0/0/0) Section: Introduction
419+2+44 (1/1/0/0) Subsection: Analytical Rheology
646+2+38 (2/1/3/0) Subsection: Polymers
236+3+0 (1/0/0/0) Subsection: Scope and Organization
35+3+0 (1/0/0/0) Section: Motivation and Background
355+2+19 (1/0/2/0) Subsection: Linear Polymers
657+2+24 (1/1/17/1) Subsection: Branched Polymers
205+3+45 (1/1/0/0) Subsection: Model-driven Analytical Rheology
97+6+0 (1/0/0/0) Section: Models for Polymer Dynamics and Rheology
597+2+0 (1/0/2/0) Subsection: Historical Development
872+5+25 (2/1/8/0) Subsection: The Tube Model
211+4+0 (1/0/0/0) Subsection: State of the Art
273+2+0 (1/0/5/0) Subsection: Computational Models
162+3+0 (1/0/0/0) Section: Methods and Progress
1774+5+0 (3/0/108/15) Subsection: Linear Polymers
2955+24+24 (9/0/51/2) Subsection: Branched Polymers
727+3+0 (1/0/2/0) Section: Summary and Perspective

You can exercise significant control over the way it parses the document and reports the results by using options that are described in the manual.

Thursday, August 9, 2012

Waiting For Superman

I finally saw Davis Guggenheim's (also behind "An Inconvenient Truth") documentary "Waiting for Superman" on Netflix. The documentary follows the poignant stories of a few earnest kids trying (their luck) to get into successful charter schools, because of failing neighborhood public schools. To its credit, the documentary is highly entertaining, takes on a subject which is hard to talk about without attracting enemy fire, and passes the test of good story-telling - it makes us share the hope and disappointment of the kids before and after the lottery. No wonder it has a 89% rating on rottentomatoes.

Weaving fact and argument into the human story, Guggenheim tries to educate us about what is wrong with public education - the distorted incentive structure, teachers unions and tenure, the bureaucratic maze of education administration etc. It is probably fair to say that by the end of the movie, the typical audience is led to believe that more charter schools are an important part of any attempt to resolve the crisis.

Which may or may not be true.

The most cogent counter-argument was Diane Ravitch's article "The Myth of Charter Schools" in the New York Review of Books (and not this insipid rebuttal, IMHO). I like her factual tone, as she carefully destroys most of the intellectual basis of the film.

For example:

... it evaluated student progress on math tests in half the nation’s five thousand charter schools and concluded that 17 percent were superior to a matched traditional public school; 37 percent were worse than the public school; and the remaining 46 percent had academic gains no different from that of a similar public school.
...
The movie asserts a central thesis in today’s school reform discussion: the idea that teachers are the most important factor determining student achievement. But this proposition is false. Hanushek has released studies showing that teacher quality accounts for about 7.5–10 percent of student test score gains. Several other high-quality analyses echo this finding, and while estimates vary a bit, there is a relative consensus: teachers statistically account for around 10–20 percent of achievement outcomes.

You should really read the article in entirety, as I find myself wanting to excerpt the whole thing.

Monday, August 6, 2012

Don't use gnuplot "fit" blindly!

Gnuplot is a great plotting/data-analysis program. It has a fairly robust tool called "fit" which can be used to perform nonlinear least-squares regressions very conveniently.

It does what it advertizes, and infact, does it quite well.

But let us consider a simple cautionary example.

Consider a linear model $Y = P_1 + P_2 X$, with $P_1$ = 2, and $P_2$ = 1.5. Let us generate data using this model, and add some white noise to it. In particular we assume that $Y_e = N(Y,\sigma=0.5)$ is the experimental data which is normally distributed about the linear model with a standard deviation of 0.5.

The red line is the model, the error-bars denote the standard deviation of the Gaussian noise, and the circles represent a particular realization of experimental data.

We can generate experimental data using the prescribed model and noise profile. This is shown in the figure above.

If we use gnuplot "fit" to fit through the data-points above we get:

Final set of parameters Asymptotic Standard Error
======================= ==========================
P1 = 1.98866 +/- 0.1156 (5.812%)
P2 = 1.4667 +/- 0.03817 (2.603%)

This looks quite good. Even the parameter estimates look quite sharp.

In a way, this is expected because "fit" assumes that the error in $Y$ is distributed normally (which in this case it is!).

Now let us convert this idealized problem, which we grok completely, into something slightly more complicated.

Let us apply a nonlinear transformation on all the participating variables. Thus, we define $x = \exp(X)$ and $y=\exp(Y)$. A replotting of this transformed problem looks like:

This picture is the same as the first figure with the exponential transformation applied. The location of the circles and the red line is the same, except that they are transformed. The location of the bottom and top tips of the error bars are also the same, except that they too are transformed. In short, if I plotted this figure on a log-log plot, it would "look" identical to the first figure.

However notice two important differences. One, the size of the error bars is not the same. It covaries with the magnitude of $y$. Two, the error bars are not even symmetrical. In fact, it can be shown that if the variable $Y \sim N(\mu,\sigma)$, then $y = \exp(Y) \sim \text{lgN}(\mu,\sigma)$, where $\text{lgN}$ refers to the log-normal distribution.

Let us throw this non-linear problem at gnuplot. Clearly, we don't expect things to go as smoothly as before, because the distribution of the error violates the Gaussian assumption in gnuplot's "fit".

Final set of parameters Asymptotic Standard Error

======================= ==========================
p1 = 4.68963 +/- 0.7653 (16.32%)
p2 = 1.63074 +/- 0.03327 (2.04%)

Note that not only are the parameter values wrong ($p_1 = \exp(P_1)$ should be 7.39 and not 4.69, and $p_2 = P_2$ should be 1.5 not 1.63), but the error bars are quite meaningless.

It is particularly important to realize that this is *bad* even though the "fit" visually looks better than the actual model!

But lift the veil by replotting the figure above on a log-log scale, and you can guess what exactly is going on!

Thursday, August 2, 2012

Krugman on Freidman

Lately, I have become fascinated by Milton Friedman. I just borrowed a copy of "Capitalism and Freedom" from the library, and look forward to reading it.

I found this interesting piece (2007) by Paul Krugman at the New York Review of Books, entitled "Who was Milton Friedman?"

What’s odd about Friedman’s absolutism on the virtues of markets and the vices of government is that in his work as an economist’s economist he was actually a model of restraint. As I pointed out earlier, he made great contributions to economic theory by emphasizing the role of individual rationality—but unlike some of his colleagues, he knew where to stop. Why didn’t he exhibit the same restraint in his role as a public intellectual?

The answer, I suspect, is that he got caught up in an essentially political role. Milton Friedman the great economist could and did acknowledge ambiguity. But Milton Friedman the great champion of free markets was expected to preach the true faith, not give voice to doubts. And he ended up playing the role his followers expected. As a result, over time the refreshing iconoclasm of his early career hardened into a rigid defense of what had become the new orthodoxy.

In the long run, great men are remembered for their strengths, not their weaknesses, and Milton Friedman was a very great man indeed—a man of intellectual courage who was one of the most important economic thinkers of all time, and possibly the most brilliant communicator of economic ideas to the general public that ever lived.

Tuesday, July 31, 2012

Jason Alexander on Gun Control

If you haven't already read this post by "George Costanza", you absolutely should. He makes a superb nuanced case for additional gun control (of assault rifle type firearms) by looking at some common and some not-so-common arguments. He does not call for regulating all guns, only assault type firearms.

To summarize/paraphrase the rebuttal:

Argument 1: The right to bear arms is in the constitution
Rebuttal: If you want to be literal, the second amendment to the constitution confers the right to bear arms only to well regulated militia.

Argument 2: Forget literal! What about the spirit of the constitution?
Rebuttal: All rights have boundaries. The right to free speech ends before you can shout "fire" in a stadium or maliciously defame someone. Clearly, the right to bear arms does not extend all the way possessing anti-aircraft missiles, tanks or chemical weapons.

Argument 3: Guns don't kill, people do. Should you ban all baseball bats because X bludgeoned Y to death with one?
Rebuttal: Baseball bats have other legitimate uses. Assault rifles have no legitimate roles outside of battle zones that are not satisfied with less lethal weapons.

Argument 4: If everyone had a concealed weapon, these psychotic killers could be stopped before they did much damage.
Rebuttal: You mean in a crowded, chaotic environment, with the perpetrator wearing a bulletproof vest? Really?

Argument 5: Regulation wouldn't help. The bad guys would get the bad stuff anyway.
Rebuttal: It would at least deter some psychotics from walking to the nearest KMart to get one. Also, see #2: we already regulate/ban some types of particularly harmful weapons.

Friday, July 27, 2012

QuickSort Vizualized: Hungarian Dance

Fascinating!

Tuesday, July 24, 2012

Too Big to Fail and The Big Short

If I ever had to make a list of the top five world events of my lifetime, the great recession (TGR) of 2008 will probably feature prominently (I hope. I don't think I want to live in "very interesting times").

I recently read two books on TGR: Andrew Ross Sorkin's "Too Big to Fail", and Michael Lewis' "The Big Short". Both books are eminently readable.

Sorkin's book deals with the events leading up to the fall of Lehman Brothers, and reads like a movie screenplay. The "dialogues" of the principal actors in the drama are written in first person. I don't know how, and how accurately, Sorkin managed to do that, but it does make for compelling storytelling.

You get a sense for how chaotic those times were, and how the principals involved had to make important decisions under extreme uncertainty and pressure. And how easy it is for many commentators on the crisis to be Monday night quarterbacks.

The book provides interesting color on people who have subsequently come to be viewed in the media somewhat uni-dimensionally. For example, you learn how Lehman CEO Dick Fuld, stood up for the weak in an ROTC camp in his college days, before coming to be unanimously reviled as an out-of-touch, and perhaps, criminal operator. You learn how unaware of social niceties former Treasury Secretary Hank Paulson was. I never knew that this Republican, former Goldman Sachs CEO was a Toyota-Prius-driving birdwatcher and environmentalist.

I would strongly recommend Sorkin's book for the scene-by-scene portrayal of some very tumultuous times, and for the fullness with which it casts some of the most reviled people in America today.

If Too Big to Fail is a view from the inside, The Big Short is a view from the outside.

Michael Lewis' book outlines the stories of a few unlikely characters who foresaw the financial massacre a few years earlier, and smartly bet against it. It follows the trail of social misfits like Michael Burry, a former medical doctor-turned-hedge fund manager, who was among the first to figure it all out, only to be hounded by investors during trying times, and Steve Eisman who managed a fund for Morgan Stanley and lamented that he couldn't short his parent company.

Even if these people knew the whole thing was going to blow up, they did not know when. And even if they bought insurance to bet on the outcome they thought was most likely, they could not be sure that when the house was on fire, the insurer wouldn't go bankrupt. As Warren Buffett put it succinctly “It's not just who you sleep with, it's also who they are sleeping with,”

Here's a commencement speech by Michael Burry, and another one by Michael Lewis.

Saturday, July 21, 2012

A Twist in the Prisoner's Dilemma

Prisoner's dilemma is a famous model in game theory, which in a basic form can be asserted as:

Two men are arrested, but the police do not possess enough information for a conviction. Following the separation of the two men, the police offer both a similar deal—if one testifies against his partner (defects/betrays), and the other remains silent (cooperates/assists), the betrayer goes free and the one that remains silent receives the full one-year sentence. If both remain silent, both are sentenced to only one month in jail for a minor charge. If each 'rats out' the other, each receives a three-month sentence. Each prisoner must choose either to betray or remain silent; the decision of each is kept quiet. What should they do?

Despite its apparent simplicity, it is used as a model in places you probably wouldn't imagine it to be.

The solution to this one-off game is quite simple (if depressing). You should not cooperate.

A more interesting version is the iterated prisoners dilemma, where the game is played over and over again. For the longest time (since I took a course in game theory about 10 years ago), it was always assumed that, empirically, a simple strategy like "tit-for-tat" offered a decent balance between simplicity and effectiveness.

Turns out that there is a new twist in the plot.

William Press (of Numerical Recipes fame) and Freeman Dyson ("the") recently published a paper in PNAS (open access) that seems to be getting a lot of attention.

There is a nice commentary (pdf) that is easy to follow for those of us, who have an undergrad-level understanding of the problem.

Wednesday, July 18, 2012

Splitting a large text file by number of lines and tags

Say you have a big file (text, picture, movie, etc) and you want to split it into many small parts. You may want to do this to email it to someone in more manageable chunks, or perhaps analyze it using a program that cannot handle all of the data at once.

The Linux command split lets you chop your file into chunks of specified size. To break a big file called "BigFile.mpg" into multiple smaller chunks "chunkaa", "chunkab" etc. you say somthing like.

split -b 10M BigFile.mpg chunk

Consider a simpler case, where the big file is a text file. For concreteness assume that BigFile.txt looks like:

# t = 0
particle1-coordinates
particle2-coordinates
...
particleN-coordinates

# t = 1
particle1-coordinates
particle2-coordinates
...
particleN-coordinates
...
# t = tfinal
particle1-coordinates
particle2-coordinates
...
particleN-coordinates

You may generate one of these, if you are running a particle-based simulation like MD, and printing out the coordinates of N particles in your systems periodically. For concreteness say N = 1000, and tfinal = 500.

If this file were too big, and you wanted to split it up into multiple files (one file for each time snapshot) then you could still use the split command as follows

split -l 1002 BigFile.txt chunks

The 1002 includes the two additional lines: the time stamp and the blank line after the time snapshot.

You can also use awk instead, and use the fact that the "#" tag demarcates records

awk 'BEGIN {c=0} /#/{next; c++} {print > c ".dat"}' BigFile.txt

would do something very similar. It would match the "#" tag and create files 0.dat etc. containing the different time-stamps. The advantage of this method is that you have more flexibility in naming your chopped pieces, and you don't have to know the value of "N" before-hand.

Finally, say you wanted to create chopped pieces in a different way. Instead of chopping up timestamps, you wanted to store the trajectories of individual particles in separate files. So while the methods above would have created 500 files with 1000 (+2) lines each, you now want to create 1000 files with 500 lines. One of the easiest ways is to use sed.

sed -n 1~10p prints every tenth line starting with the "1"st line. You can use this to write a simple shell script.

npart=1000;
ndiff=$((npart + 2))
n=1;
while [ $n -le $npart ]
do
nstart=$((n+1))
sed -n $nstart ~ $ndiff'p' rcm > $n.dat
n=$((n + 1))
done

Note the single quotes around "p" in the line containing the sed command.

Friday, July 13, 2012

Links:

1. Licensing to Scale

2. Godel's Incompleteness Theorem

3. Nice Prime

4. Culture of Distraction (video link within the blog)

Thursday, July 12, 2012

Linear Least Squares in GNU Octave: Part Deux

I recently blogged about doing LLS in GNU Octave, getting not only the best fit parameters, but also the standard error associated with those estimates (assuming the error in the data is normally distributed).

Octave as an built-in function to do ordinary least squares: ols. While it does not directly report the standard errors associated with the regressed parameters, it is relatively straightforward to get them from the variables returned.

Here is how you would solve the same example as before:

Nobs = 10;
x = linspace(0,1,Nobs)';
y = 1 + 2 * x + 0.05 * randn(size(x));
X = [ones(Nobs,1) x];

[beta sigma] = ols(y,X)
yest = X * beta;

p = length(beta); % #parameters
varb = sigma * ((X'*X)\eye(p)); % variance of "b"
se = sqrt(diag(varb)) % standard errors

Monday, July 9, 2012

EconTalk Podcasts and More

I stumbled upon EconTalk podcasts earlier this year, and have been hooked. I find myself listening to these fascinating long-form (~1 hour) discussions on various "economic" topics with intellectual leaders while exercising, driving, doing dishes etc.

The whole discussion, while being in a question-answer format, is not really an "interview" in the Charlie Rose sense. The focus is more on the topic, and less on the person.

Here is the description of the program from the website:

The Library of Economics and Liberty carries a weekly podcast, EconTalk, hosted by Russ Roberts. The talk show features one-on-one discussions with an eclectic mix of authors, professors, Nobel Laureates, entrepreneurs, leaders of charities and businesses, and people on the street. The emphases are on using topical books and the news to illustrate economic principles. Exploring how economics emerges in practice is a primary theme.

The quality of the discussions in the forum is quite extraordinary.

Thursday, July 5, 2012

Milton Friedman videos on YouTube

Milton Friedman may be a controversial economist, but his videos on YouTube reflect why he was such an intellectual juggernaut. Here are a few that I used to while away a perfectly enjoyable afternoon.

1. No free lunch:

2. Young Michael Moore challenging Friedman: He was so much thinner then. (Edit: Apparently not the real Michael Moore)

3. An older black and white video:

etc.

Tuesday, July 3, 2012

Linear Least Squares in GNU Octave with Standard Errors

Consider a linear model with $N$ observations of the quantity $Y$, as a function of $ p$ regressors, $Y = \sum \beta_i X_i$.

\[\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} X_{11} & X_{12} & ... & X_{1p} \\ X_{21} & X_{22} & ... & X_{2p} \\ & & \ddots & \\ X_{N1} & X_{N2} & ... & X_{Np} \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \epsilon,\]

where $\epsilon$ is normally distributed error. Each row corresponds to an observation, and each column and associated $\beta$ corresponds to a parameter to be regressed

In general, this can be written as $Y = X \beta$.

As a simple illustrative case consider fitting the model $Y = \beta_0 + \beta_1 X$. The above equation becomes: \[\begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{bmatrix} = \begin{bmatrix} 1 & X_1 \\ 1 & X_{2} \\ \vdots & \vdots \\ 1 & X_{N} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}.\]

Given the expressions in the wikipedia article on LLS, we can easily write an Octave program that takes in y and X as above, and spits out the best estimate, the standard error on those estimates, and the set of residuals.

% For y = X b + gaussian noise:

% compute b, standard error on b, and residuals

function [b se r] = lls(y, X)

[Nobs, p] = size(X); % size of data

b = (X'*X)\X'*y; % b = estimate beta

df = Nobs - p; % degrees of freedom

r = y - X * b; % residuals

s2 = (r' * r)/df; % SSER

varb = s2 * ((X'*X)\eye(p)); % variance of "b"

se = sqrt(diag(varb)); % standard errors

endfunction

To test the model I "create" the data y = 1 + 2x + white noise.

> x = linspace(0,1,10)';

> y = 1 + 2 * x + 0.05 * randn(size(x));

> X = [ones(t,1) x];

> [beta se] = lls(y,X)

beta =

0.98210

2.03113

se =

0.039329

0.066302

A plot of the data and the regression looks like this:

Friday, June 29, 2012

Some Language Links of Interest

1. How language affects math:

In English, we say fourteen, sixteen, seventeen, eighteen, and nineteen, so one might expect that we would also say oneteen, twoteen, threeteen, and fiveteen. But we don’t. We use a different form: eleven, twelve, thirteen and fifteen. Similarly, we have forty and sixty, which sound like the words they are related to (four and six). But we also say twenty, thirty and fifty, which only sort of sound like two and three and five. And for that matter, for numbers above twenty, we put the ‘decade’ first and the unit number second (twenty-one, twenty-two) whereas for the teens, we do it the other way around (fourteen, seventeen, and eighteen). The number system in English is clearly highly irregular.

2. Acronyms and Initialisms: I whiled away a good hour on the wikipedia page. AIDS and SONAR are acronyms (pronounced as one word), while FBI and USA (pronounced as individual letters) are initialisms. Things are sometimes not as clearcut. What about MS-DOS and MPEG? And what about things like FAQ, SAT, and GRE, which are can be used either way.

3. Some tips on good grammar.

Saturday, June 23, 2012

Hussman on Eurobonds

John Hussman on Eurobonds:

With respect to Eurobonds, investors should understand that what is really being proposed is a system where all European countries share the collective credit risk of European member countries, allowing each country to issue debt on that collective credit standing, but leaving the more fiscally responsible ones - Germany and a handful of other European states - actually obligated to make good on the debt.

This is like 9 broke guys walking up to Warren Buffett and proposing that they all get together so each of them can issue "Warrenbonds." About 90% of the group would agree on the wisdom of that idea, and Warren would be criticized as a "holdout" to the success of the plan. You'd have 9 guys issuing press releases on their "general agreement" about the concept, and in his weaker moments, Buffett might even offer to "study" the proposal. But Buffett would never agree unless he could impose spending austerity and nearly complete authority over the budgets of those 9 guys. None of them would be willing to give up that much sovereignty, so the idea would never get off the ground. Without major steps toward fiscal union involving a substantial loss of national sovereignty, the same is true for Eurobonds.

Thursday, June 21, 2012

Linear regression and logarithms don't mix?

You have a power-law model $y=a_0 x^{a_1}$, and a bunch of experimental data points \[\begin{bmatrix}x_1 & y_1 \\ x_2 & y_2 \\ \vdots & \vdots \\ x_n & y_n\end{bmatrix}.\]
You want to estimate $a_0$ and $a_1$, it is tempting to take the logarithm of both sides $\log y = \log a_0 + a_1 \log x$, and perform linear regression on suitably transformed experimental data \[\begin{bmatrix}\log x_1 & \log y_1 \\ \log x_2 & \log y_2 \\ \vdots & \vdots \\ \log x_n & \log y_n\end{bmatrix}.\]

Beware! You may get something different from what you expect! And your answers might not mean much.

Perhaps, you should be doing maximum likelihood instead. Here is a nice tutorial (pdf) on it.

PS: Part of the motivation for this post was this.

Monday, June 18, 2012

Paper or Plastic?

I have often seriously thought of looking for the "correct" answer to the ubiquitous "paper or plastic?" question that greets so many of us at the end of each grocery trip.

As I've always suspected, it is a trick question: the correct answer is this:

for sale at richarddawkins.net

A little bit of surfing brought up these two links and some startling facts therein. MSNBC also has a decent interactive presentation.

If you have to choose however, my take is that - other than biodegradability - plastic trumps paper. Biodegradability is certainly important, but perhaps there is some hope.

To make those innocuous paper bags it takes:

4x more energy
20x more fresh water
far more toxic chemicals that cause greater air and water pollution

In addition, they aren't recycled significantly more.

Tuesday, June 12, 2012

Why "x" became the unknown?

A really short (4 minute) video on TEDx (H/T Ramanan Iyer).

Answer: Translation from Arabic to Spanish.

Sunday, June 10, 2012

Interesting things I learnt recently

1. PLU Codes: I never knew those "Price LookUp" (PLU) codes on fruits and vegetables meant anything:

PLU codes: gardenpartynyc

According to the wikipedia entry:

The code is usually a four-digit number, currently in the 3000–4999 range, identifying the type of bulk produce, including the variety. The Produce Marketing Association suggests an optional convention whereby a fifth digit may be prefixed to the number to indicate if the produce is organic (prefixed by a '9') or genetically modified (prefixed by an '8').

However, adherence to PLU standards is voluntary, and you cannot use it to reliably tell whether something is organic.

2. GPU Programming in Matlab: I knew it was possible, but I did not realize how simple it was, until I sat through a MathWorks seminar on campus. While it works only with newer NVIDIA cards (that can handle double precision naturally) and allows only certain kinds of parallelizations, I must admit (with glee) that they've made it much easier to tame problems by throwing additional hardware at it.