Clueless Fundatma: June 2016

Sunday, June 26, 2016

Background Material For Students Entering SciComp

Our department, Scientific Computing, is highly interdisciplinary, and we get graduate students from very different backgrounds.

Here is a list of resources, I recommended incoming grad students to look at before they start.

1. Lectures in Differential Equations and Linear Algebra

These video lectures by Gilbert Strang and Cleve Moler present a quick summary of ODEs and major Linear Algebra topics.

2. An introduction to Matlab

In addition to a compiled language like C++/Fortran/Java, it is useful to know either Matlab or Python

3. Scientific Computing with Python

This contains a set of extremely useful jupyter or iPython notebooks (PDF link if you don't have jupyter installed) which provide a gentle introduction to Python in Scientific Computing. It also includes several advanced topics (parallel programming, incorporating C and Fortran, version control etc.)

Friday, June 24, 2016

Good and Bad Metrics

We learnt not that long ago that 66 new journals were banned by Thomson-Reuters for abusing impact factors by excessive self-citation. While crimes committed by these journals may have been egregious, the subtle, and sometimes not-so-subtle, abuse of impact factors is pervasive.

Curiously, I don't find the crimes surprising. In fact, I would be surprised, if such manipulations did not occur.

If someone (Thomson-Reuters) tells you, "I will measure your performance by this simple yardstick," and that metric has real consequences (whether libraries buy your journal), then clearly you (the publisher) are going to do everything you can to push that metric as far high as you can.

If you build a simple metric or index to quantify complex stuff (academic ranking of universities, IQ to measure smartness, student performance on standardized tests to determine teacher pay etc.), which is linked to a "real" prize, you can rest assured that your metric will be gamed. As I have said before:

I gather this fascination has something to do with out inability to grope with multidimensional complexity. We try to project a complex high-dimensional space onto a simple scalar. We like scalars because we can intuitively compare two scalars. We can order them, plot them on graphs, and run statistics on them with ease.

We can step outside the academic realm and look at few examples.

Cases where a simple metric works best is where the "thing" being measured is simple. For example, in a 100m sprint or high-jump the only thing you care about is the speed and height, respectively. I think of underlying stuff as being "one-dimensional". There is nothing to game here (no pun intended); if you can run faster, you deserve to be champion.

One example, where a simple metric of a somewhat complex thing actually works alright, is Google PageRank. Before Google came along with the really cool idea of "one link, one vote", which reduced the complex task of organizing the relative importance of websites to solving an eigenvalue problem, web search was really hit of miss.

When people did not know about the metric (PageRank in this case), it worked beautifully. Once the metric was public knowledge, and Google became a virtual monopoly in this business, the business of "search engine optimization" (SEO), which seeks to game the metric, suddenly became very lucrative.

Now Google has to do secret stuff to the keep the abusers out. Their efforts, and the consolidation of the web (wikipedia is the #1 link almost always) has helped, so that the metric has not been completely compromised (or so I think, since I don't know what Google doesn't show me).

This is a useful example, since it exposes the conflict between Google Search users (who want the more relevant results to surface to the top), and websites (who want to surface to the top, regardless of relevance), that Google has to manage.

A final example, whose story perhaps has the greatest relevance to academic short-cut metrics, is the somewhat whimsical metric of FICO credit scores in the US. These scores are supposed to determine a person's credit-worthiness, and has real significance if you want to get a mortgage or car loan. A pesky problem with the score is that it treats a perfectly frugal person, who pays her bills on time (in cash), and has never taken on any form of debt, with contempt.

A bigger problem is the reliability of the score (see Credit Scores: Not-so-Magic-Numbers), perhaps because the key ingredients that go into the score are reasonably well-known and easily gamed. One of the heartening reactions:

Golden West Financial (WB), a longtime FICO skeptic, is one of the few mortgage lenders to minimize its use in recent years—and it credits that decision for its below-average mortgage losses. Now a subsidiary of Wachovia (WB), Golden West's delinquency rate on traditional mortgages is running at 0.75%, vs. 1.04% for the industry. Richard Atkinson, who oversees part of Golden West's mortgage unit from San Antonio, says the bank calls to verify employment, examines a borrower's stock holdings and other assets, and employs a team of appraisers who are judged not by the volume of loans but by the accuracy of the appraisal over the life of the loan. "The way we do business is a lot more costly, and cost was a big reason many competitors embraced credit scoring," he says. "But some of our best borrowers had low FICO scores and our worst had FICO scores of 750."

How great it would be if we academics adopted a similar approach.

Wednesday, June 15, 2016

The Savitzky-Golay Filter

In 1964, Abraham Savitzky and Marcel Golay published a paper "Smoothing and differentiation of data by simplified least squares procedures" in Analytical Chemistry, which has been heralded as one of the 10 most influential papers in the journal's history.

The Savitzky-Golay filter (SGF) is a digital filter used to smooth noisy data. The basic idea is to chop the dataset into subsets, and then use a low order polynomial to fit successive subsets.

Implementations are available in Octave/Matlab and in recent versions (>0.16) of scipy for python.

Here is a potential use case. I did a Lennard-Jones melt simulation using LAMMPS, and obtained the following pair correlation function g(r) [click to enlarge].

If you look closely, there is a fair amount of noise due to binning.

Let us use SciPy to smooth the noise.

from scipy.signal import savgol_filter
r, gr = np.loadtxt('gr.dat', unpack=True) # read file from disk
gsm = savgol_filter(gr, 15, 4) # smooth it

The second argument (15) has to be an odd number and is the window or subset size, and the third (4) argument is the degree of polynomial to regress. When I plot the smoothed curve:

plt.plot(rFG,grFG,'.')
plt.plot(rFG,gsm1,label='SG')

If you look closely, again:

One can experiment with the window size and the degree of polynomial. In general, a larger window, and a higher degree polynomial make the curve smoother. The figure below shows a window of size 7 and 31 with a degree 4 polynomial.

Thursday, June 9, 2016

The Upselling of Grit

You might have seen this TED video on the importance of grit

The key pitch as summarized by Daniel Engber at Slate are two ideas: (i) grit is among the best predictors of success, and (ii) we can change the level of grit.

The pitch is successful in part because the first idea seems obvious: we all remember examples of underdogs who overcame incredible odds by triumphing over superior enemies (or overwhelming circumstances) through sheer perseverance and hard work. The second idea appeals to our sensibility of fairness by suggesting that your success is not predestined by the circumstances of birth. Rather, it is within the circle of your influence.

Engber puts it this way:

[...] optimistic message that you find in Grit: It’s possible for all of us to change or, as one book puts it, to feel the triumph of a “neuroplastic transformation.” They tell us that we needn’t be the victims of our meager talents or our lousy genes.

Critical examination reveals a more pessimistic picture. A picture in which that ugly monster, IQ, raises its unwelcome head.

In this interview on Vox "Why IQ matters more than Grit" Resnick has the following exchange with Stuart Ritchie:

BR: I found a lot of this research to be depressing. In your book, you lay out a compelling case that IQ reliably is correlated with longevity, economic success, and physical well-being. You also make it clear that IQ doesn't change all that much throughout our lives. We're kind of stuck with what we've got. I guess I find it unfair.

SR: First of all, the most important thing to say is that it doesn’t matter if it’s depressing if that’s what the research says. One can’t deny it.

Think about how it would it be if it was the other way around; there might actually be some bad outcomes.

Because then parents would be able to totally control their kids with bad parenting, and wreck kids’ IQs for the rest of their lives. Governments could have big influences on people’s IQs by enacting different policies toward different sets of people in the country.

It also turns out that IQ is also strongly correlated with measures like emotional intelligence and grit itself.

Tuesday, June 7, 2016

Diffuvisity Induced Segregation

It takes only a small difference in size or shape for particles to spontaneously demix. The famous "Brazil Nut Effect" is one common example.

There are perhaps sociological analogs, where racial, income-based, or religious clustering arises from small differences. A Google or Google-Scholar search for "auto-segregation" or "self-segregation" brings out many of these examples.

It was with much interest that I read "Binary Mixtures of Particles with Different Diffusivities Demix" (paywalled). The abstract reads:

The influence of size differences, shape, mass, and persistent motion on phase separation in binary mixtures has been intensively studied. Here we focus on the exclusive role of diffusivity differences in binary mixtures of equal-sized particles. We find an effective attraction between the less diffusive particles, which are essentially caged in the surrounding species with the higher diffusion constant. This effect leads to phase separation for systems above a critical size: A single close-packed cluster made up of the less diffusive species emerges. Experiments for testing our predictions are outlined.

There is a non-paywalled video that shows the demixing process in the supplemental materials section.

Here is a decent commentary:

Soluble substances normally become evenly distributed throughout the solvent medium, thanks to passive molecular diffusion. The rate at which this occurs depends on the diffusion constant of the molecule concerned, whose magnitude increases with the temperature. In mixtures that have attained thermal equilibrium, particles of equal size normally exhibit the same diffusion constant. "We were interested in what happens when particles of equal size differ in their diffusion constants," says Simon Weber, first author on the new paper.

Friday, June 3, 2016

Quotables

1. Alain de Botton in "Why You Will Marry the Wrong Person"

The person who is best suited to us is not the person who shares our every taste (he or she doesn’t exist), but the person who can negotiate differences in taste intelligently — the person who is good at disagreement. Rather than some notional idea of perfect complementarity, it is the capacity to tolerate differences with generosity that is the true marker of the “not overly wrong” person. Compatibility is an achievement of love; it must not be its precondition.

2. "Forgiveness means letting go of the hope for a better past." - @LamaSuryaDas

— Jason Garner (@Thejasongarner) June 1, 2016

3. “The perfect man employs his mind as a mirror. It grasps nothing; it refuses nothing. It receives, but does not keep.” -Laozi

— Daily Zen (@dailyzen) May 28, 2016

4. "I can see that you have a complex problem: it has a real and an imaginary part." -- John Tukey

— Data Science Fact (@DataSciFact) May 26, 2016

5. Opportunities multiply as they are seized. (Sun Tzu, The Art of War)

Thursday, June 2, 2016

Nauru and the Curse of Natural Resources

Recently "This American Life" rebroadcast their 2003 show "The Middle of Nowhere". One of the segments was on the tiny Micronesian island (~8 sq. miles) of Nauru.

The story is a tragedy that is still unfolding at snail-speed. From wikipedia:

Nauru is a phosphate rock island with rich deposits near the surface, which allowed easy strip mining operations. It has some remaining phosphate resources which, as of 2011, are not economically viable for extraction. Nauru boasted the highest per-capita income enjoyed by any sovereign state in the world during the late 1960s and early 1970s. When the phosphate reserves were exhausted, and the island's environment had been seriously harmed by mining, the trust that had been established to manage the island's wealth diminished in value. To earn income, Nauru briefly became a tax haven and illegal money laundering centre. From 2001 to 2008, and again from 2012, it accepted aid from the Australian Government in exchange for hosting the Nauru detention centre.

After listening to the story, I had to scour for more:

Some pictures (~2012) of the island (Traveling Artist)
More pictures from Sally McInerney
Julie Morse: "Nauru - a human dumping ground"
Nauru's downfall

With so much of the island mined, all that was left was an environmental wasteland riddled with decay. The damage is so severe that 75 per cent of the country is uninhabitable.