Clueless Fundatma: August 2012

Tuesday, August 28, 2012

Julia: Sounds Exciting!

I've been hearing about the excitement around Julia language on and off for a while now. It is hard to avoid, given my line of work. Today I took some time to scope out the landscape, and I like what I see!

Here is a quick introduction to why the authors decided to create Julia. Basically, it sets out with a fairly ambitious mandate.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

It has a Matlab like syntax and flexibility, and from benchmarks it seems to approach the performance of C and Fortran.

The language is still in active development, and if it lives up to some of its promise, I can see it being a game-changer in scientific computing.

Monday, August 20, 2012

Aligning Matrix Columns in LaTeX

The standard amsmath packages provide a convenient interface to write matrices in LaTeX. For example, the following code produce the matrix:

\[\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}.\]

Clearly the columns could do with some alignment. Of course, you could do it from scratch by using the array environment, but that would mean giving up the convenience of the AMS matrix environments.

Enter the mathtools package, which is an add-on to the amsmath packages. The mathtools package loads the amsmath packages, if they haven't already been.

Once you've loaded the mathtools package (using the usepackage command), you can specify the alignment of your matrices with a special starred environment as follows:

which yields:

Voila!

Thursday, August 16, 2012

How to Make a Mesh: John Burkardt

My colleague, John, is a gifted teacher and communicator. I have previously linked to the cornucopia of numerical math software that he has obsessively (in his own words) curated.

He recently presented a talk on using publicly available programs to do meshing. It is a non-mathematical, tutorial-like presentation. Even though I haven't had to do any heavy duty meshing in my research thus far, I thoroughly enjoyed his talk.

I'm trying to convince him to videotape the rest of his talks over summer. Let's see what happens.

Monday, August 13, 2012

TeXcount: Number of words in a LaTeX document

Since TeX is really a markup language, counting the number of words in a document is tricky. Obviously, you don't want to literally count words in tags like \chapter{}, \begin{center}, \cite{reference1, reference2} etc.

You may have macros, which need interpretation.

You may have external files that you are collecting together in a master document by using \input{} etc.

In short, it is not as simple as it seems.

You could try to use front-end programs like kile or TeXShop which will give you a simple total count. My front-end program of choice --- TeXMaker --- does not do it for me.

If your document is very simple, you could try to "detex" the LaTeX tags, and use a simple Linux utility like "wc".

The best solution seems to be TeXcount.

There is a web-interface that lets you paste your TeX document in a web-form.

Alternatively you can download the script. It is essentially a small perl program (400kB download in all, the actual script is about 90kB) called texcount.pl, which you can run quite simply as

perl texcount.pl filename.tex

Here's the form (default) output it spits out

Encoding: ascii
Words in text: 10324
Words in headers: 81
Words in float captions: 219
Number of headers: 30
Number of floats: 5
Number of math inlines: 198
Number of math displayed: 18
Subcounts:
text+headers+captions (#headers/#floats/#inlines/#displayed)
14+9+0 (1/0/0/0) _top_
89+1+0 (1/0/0/0) Section: Introduction
419+2+44 (1/1/0/0) Subsection: Analytical Rheology
646+2+38 (2/1/3/0) Subsection: Polymers
236+3+0 (1/0/0/0) Subsection: Scope and Organization
35+3+0 (1/0/0/0) Section: Motivation and Background
355+2+19 (1/0/2/0) Subsection: Linear Polymers
657+2+24 (1/1/17/1) Subsection: Branched Polymers
205+3+45 (1/1/0/0) Subsection: Model-driven Analytical Rheology
97+6+0 (1/0/0/0) Section: Models for Polymer Dynamics and Rheology
597+2+0 (1/0/2/0) Subsection: Historical Development
872+5+25 (2/1/8/0) Subsection: The Tube Model
211+4+0 (1/0/0/0) Subsection: State of the Art
273+2+0 (1/0/5/0) Subsection: Computational Models
162+3+0 (1/0/0/0) Section: Methods and Progress
1774+5+0 (3/0/108/15) Subsection: Linear Polymers
2955+24+24 (9/0/51/2) Subsection: Branched Polymers
727+3+0 (1/0/2/0) Section: Summary and Perspective

You can exercise significant control over the way it parses the document and reports the results by using options that are described in the manual.

Thursday, August 9, 2012

Waiting For Superman

I finally saw Davis Guggenheim's (also behind "An Inconvenient Truth") documentary "Waiting for Superman" on Netflix. The documentary follows the poignant stories of a few earnest kids trying (their luck) to get into successful charter schools, because of failing neighborhood public schools. To its credit, the documentary is highly entertaining, takes on a subject which is hard to talk about without attracting enemy fire, and passes the test of good story-telling - it makes us share the hope and disappointment of the kids before and after the lottery. No wonder it has a 89% rating on rottentomatoes.

Weaving fact and argument into the human story, Guggenheim tries to educate us about what is wrong with public education - the distorted incentive structure, teachers unions and tenure, the bureaucratic maze of education administration etc. It is probably fair to say that by the end of the movie, the typical audience is led to believe that more charter schools are an important part of any attempt to resolve the crisis.

Which may or may not be true.

The most cogent counter-argument was Diane Ravitch's article "The Myth of Charter Schools" in the New York Review of Books (and not this insipid rebuttal, IMHO). I like her factual tone, as she carefully destroys most of the intellectual basis of the film.

For example:

... it evaluated student progress on math tests in half the nation’s five thousand charter schools and concluded that 17 percent were superior to a matched traditional public school; 37 percent were worse than the public school; and the remaining 46 percent had academic gains no different from that of a similar public school.
...
The movie asserts a central thesis in today’s school reform discussion: the idea that teachers are the most important factor determining student achievement. But this proposition is false. Hanushek has released studies showing that teacher quality accounts for about 7.5–10 percent of student test score gains. Several other high-quality analyses echo this finding, and while estimates vary a bit, there is a relative consensus: teachers statistically account for around 10–20 percent of achievement outcomes.

You should really read the article in entirety, as I find myself wanting to excerpt the whole thing.

Monday, August 6, 2012

Don't use gnuplot "fit" blindly!

Gnuplot is a great plotting/data-analysis program. It has a fairly robust tool called "fit" which can be used to perform nonlinear least-squares regressions very conveniently.

It does what it advertizes, and infact, does it quite well.

But let us consider a simple cautionary example.

Consider a linear model \(Y = P_1 + P_2 X\), with \(P_1\) = 2, and \(P_2\) = 1.5. Let us generate data using this model, and add some white noise to it. In particular we assume that \(Y_e = N(Y,\sigma=0.5)\) is the experimental data which is normally distributed about the linear model with a standard deviation of 0.5.

The red line is the model, the error-bars denote the standard deviation of the Gaussian noise, and the circles represent a particular realization of experimental data.

We can generate experimental data using the prescribed model and noise profile. This is shown in the figure above.

If we use gnuplot "fit" to fit through the data-points above we get:

Final set of parameters Asymptotic Standard Error
======================= ==========================
P1 = 1.98866 +/- 0.1156 (5.812%)
P2 = 1.4667 +/- 0.03817 (2.603%)

This looks quite good. Even the parameter estimates look quite sharp.

In a way, this is expected because "fit" assumes that the error in \(Y\) is distributed normally (which in this case it is!).

Now let us convert this idealized problem, which we grok completely, into something slightly more complicated.

Let us apply a nonlinear transformation on all the participating variables. Thus, we define \(x = \exp(X)\) and \(y=\exp(Y)\). A replotting of this transformed problem looks like:

This picture is the same as the first figure with the exponential transformation applied. The location of the circles and the red line is the same, except that they are transformed. The location of the bottom and top tips of the error bars are also the same, except that they too are transformed. In short, if I plotted this figure on a log-log plot, it would "look" identical to the first figure.

However notice two important differences. One, the size of the error bars is not the same. It covaries with the magnitude of \(y\). Two, the error bars are not even symmetrical. In fact, it can be shown that if the variable \(Y \sim N(\mu,\sigma)\), then \(y = \exp(Y) \sim \text{lgN}(\mu,\sigma)\), where \(\text{lgN}\) refers to the log-normal distribution.

Let us throw this non-linear problem at gnuplot. Clearly, we don't expect things to go as smoothly as before, because the distribution of the error violates the Gaussian assumption in gnuplot's "fit".

Final set of parameters Asymptotic Standard Error

======================= ==========================
p1 = 4.68963 +/- 0.7653 (16.32%)
p2 = 1.63074 +/- 0.03327 (2.04%)

Note that not only are the parameter values wrong (\(p_1 = \exp(P_1)\) should be 7.39 and not 4.69, and \(p_2 = P_2\) should be 1.5 not 1.63), but the error bars are quite meaningless.

It is particularly important to realize that this is *bad* even though the "fit" visually looks better than the actual model!

But lift the veil by replotting the figure above on a log-log scale, and you can guess what exactly is going on!

Thursday, August 2, 2012

Krugman on Freidman

Lately, I have become fascinated by Milton Friedman. I just borrowed a copy of "Capitalism and Freedom" from the library, and look forward to reading it.

I found this interesting piece (2007) by Paul Krugman at the New York Review of Books, entitled "Who was Milton Friedman?"

What’s odd about Friedman’s absolutism on the virtues of markets and the vices of government is that in his work as an economist’s economist he was actually a model of restraint. As I pointed out earlier, he made great contributions to economic theory by emphasizing the role of individual rationality—but unlike some of his colleagues, he knew where to stop. Why didn’t he exhibit the same restraint in his role as a public intellectual?

The answer, I suspect, is that he got caught up in an essentially political role. Milton Friedman the great economist could and did acknowledge ambiguity. But Milton Friedman the great champion of free markets was expected to preach the true faith, not give voice to doubts. And he ended up playing the role his followers expected. As a result, over time the refreshing iconoclasm of his early career hardened into a rigid defense of what had become the new orthodoxy.

In the long run, great men are remembered for their strengths, not their weaknesses, and Milton Friedman was a very great man indeed—a man of intellectual courage who was one of the most important economic thinkers of all time, and possibly the most brilliant communicator of economic ideas to the general public that ever lived.