Thursday, December 8, 2011

The Next Best Mate

The so-called "Sultan's Dowry Problem" (a.k.a "Beauty Pageant Problem", "Secretary Problem" etc.) is sometimes used as a model for searching a mate. It is a nice model for decision-making under a certain type of uncertainty.

According to the wikipedia entry:
The basic form of the problem is the following. Imagine an administrator willing to hire the best secretary out of N rankable applicants for a position. The applicants are interviewed one-by-one in random order. A decision about each particular applicant is to be taken immediately after the interview. Once rejected, an applicant cannot be recalled. During the interview, the administrator can rank the applicant among all applicants interviewed so far, but is unaware of the quality of yet unseen applicants.
There is an elegant solution to the problem, when the objective of the game is to maximize the probability that the candidate chosen has the highest quality. We assume quality can be reduced to a simple number. The so called "1/e" or "37%" solution to this problem involves letting the first 37% of the candidates pass, remembering the quality Q of the best candidate from this set. Thereafter, the first candidate whose quality exceeds Q is chosen.

Todd argues that the way humans search for their mates is very different from this optimal solution. He points out that the 37% rule finds the best solution more often than any other algorithm, 37% of the time. However, what happens during the 63% of the times is not very flattering:
For instance, if applied to a set of 100 dowries ranging from 1 to 100, the 37% rule returns an average value of about 82, that is, the mean of all dowries chosen by this rule. Only 67% of the individuals selected by this rule lie in the top 10% of the population, while 8% fall in the bottom 25%. And it takes the 37% rule an average of 74 tests of potential mates that is, double the 37 that must be checked before selection can begin before a mate is chosen.
This probably explains why normal people do not apply this strategy. It turns out that normal people tend to use a much smaller "screening" period. It turns out that the length of the screening period is dependent on your appetite for risk.

If you are fixated on maximizing the probablity of ending up with the best candidate, then the 37% rule works fine. But if you are a risk minimizer - if you would rather protect your downside - while accepting anybody in the top 10% for example, the optimal screening period is much shorter - closer to 10%.

I first heard about this problem more than a decade ago, and have been fascinated by it ever since. I used this problem to create a programming assignment in the class I am currently teaching.

Friday, December 2, 2011

Text in Inkscape

Inkscape is a fantastic program for creating vector graphics. It is free, platform independent, refreshingly clean and efficient once your master some keystroke shortcuts, and amazingly powerful in terms of the things it can do for you.

It does have a few pesky features though, particularly with "text". For example there is no intrinsic way of creating subscripts or superscripts, and sometimes Greek symbols just do not render properly.

This post explains my workarounds.

1. Subscripts and Superscripts: While there is no natural way of getting these, you can always select some text, and then press Alt + arrow-key (up, down, left, or right arrows) to move the selection in that particular direction. Here is a screenshot of how that works.

2. Greek Symbols: You can try to select the Symbol font from the font dialog box, but very often, it won't get you anywhere. A handy but inconvenient workaround is to use Unicode. If you know the Unicode Standard (pdf), then you can directly enter the code of the particular character. For example, the symbol for "beta" is 03B2.
To enter this in Inkscape, first open a text dialog box as usual. Then press Ctrl + U. The status bar at the bottom of the screen prompts you to enter the code. You type 03B2 (or 03b2), and you will see it echoes the symbol "beta". You press enter, and the symbol is inserted near the text cursor.

3. LaTeX Support: Inkscape supports LaTeX expressions by default. I did not know this until recently, but you can go to Extensions > Render > LaTeX formula... (in some cases it may be under Effects > instead of Extensions >).

It opens up a dialog box, in which you can enter your formula. Inkscape calls LaTeX and con­verts the DVI out­put to SVG, and em­beds it in the document. Since it is a scalable equation (or any other LaTeX object), you can now interact with it natively in Inkscape.

Wednesday, November 30, 2011

1. Old issues of Quantum, the magazine of math and science, and paper with the shortest abstract ("Probably not.") (H/T Mathematics under the Microscope) While the abstract is cute, it reads more like a conclusion than an abstract.

2. A talk by Marten Mickos (former CEO of MySQL). I enjoyed the part where he explains why working from home is harder than working at an office, because you can simply BS your way.

3. Daniel Kahneman at Google Talks.

Monday, November 21, 2011

Matlab: Profiler and parfor

Two videos on extremely-useful-but not-so-frequently-used features in Matlab.

1. Profiler: While "profiling" may be a bad word in common parlance, it is a good word in software. It helps you identify potential areas in your program that may be targets for optimization.

2. Parallel "for" loops: An easy way to exploit multicore (or distributed cores) machines for task-parallel computations. Useless trivia: I went to grad school with the person doing the video (Jiro Doke).

Saturday, November 19, 2011

Weekly LinkFest: Econ Edition

1. Steven Keen's lecture on the Great Recession. If you are interested in Behavioral Finance, you should check out the entire YouTube playlist.

2. While on the subject of behavioral economics, check out Dan Ariely's talk on "Money Changes Everything".

3. Michael Mauboussin is an amazing thinker. He works at Legg-Mason, despite the fact that his writing feels very academic (I mean that in a good way). Here is very nice collection of his articles, which are well-researched and neatly presented.

4. Surge in rich Chinese who want to come to the US.

Tuesday, November 15, 2011

Elite Institutions and Career Earnings

I finished reading Charles Wheelan's interesting book called "Naked Economics: Undressing the dismal science". It is a highly entertaining book for anyone with even a passing interest in how the world around us works. In tone, it resembles Freakanomics, but in terms of span, it feels much wider and more comprehensive, perhaps because it is not merely a collection of vignettes.

In one of the chapters, he points out to an interesting study by Krueger and Dale (2002). Graduates of highly selective schools earn higher salaries later in life than graduates of less selective schools. This does not seem very surprising.

Next, they examined the outcomes of students who were admitted to both a highly selective school, and a moderately selective school. The outcome (also the title of their 2002 paper) was that "Children Smart Enough to Get into Elite Schools may not need to Bother."

There is a more recent follow-up to that study, essentially reiterates the same conclusion.

The average SAT score of the most selective school a student applies to, is the best predictor of his or her future (monetary) success.

There is an important caveat. Minorities and other disadvantaged students gain the most from choosing an elite school over a less selective one.

Here's an interesting summary of what that means in practical terms:
Mr. Krueger gets the last word:
My advice to students: Don’t believe that the only school worth attending is one that would not admit you. That you go to college is more important than where you go. Find a school whose academic strengths match your interests and that devotes resources to instruction in those fields. Recognize that your own motivation, ambition and talents will determine your success more than the college name on your diploma.
My advice to elite colleges: Recognize that the most disadvantaged students benefit most from your instruction. Set financial aid and admission policies accordingly.

Saturday, November 12, 2011

Bystander Psychology

Recent events at Penn State have no doubt been troubling.Time has an interesting article on bystander pschyology (subtitled: why some witnesses of crime do nothing). For those not in the know: the current wide-receiver's coach and a janitor saw Jerry Sandusky (an extraordinarily celebrated coach at Penn State) in compromising situations over a decade ago, but never called the police. From the article:
(We) would like to believe that no matter how small or scared we were, if we saw a child being raped, we'd step in and stop it, or at the very least call 911 immediately. But social psychology research on "bystander" behavior suggests that many of us might actually turn away.

The most famous instance of witness apathy involves the 1964 murder of 28-year-old Kitty Genovese in New York City. News accounts — and later, social psychology texts — said the victim and her screams were ignored by 38 witnesses as she was stabbed to death on a Queens street. (Genovese's killer was denied parole this week.)

But while research has shown that many such witnesses do fail to intervene, in part because they assume others around them will do so, it turns out that the popular account of the Genovese case is largely urban legend. There were not in fact 38 witnesses, but many fewer, and most onlookers said they did not see or hear the full assault; many of the witnesses did call police.

Still, says Mark Levine, a social psychologist at Lancaster University in the U.K., the Genovese story is a "very powerful parable. It taps into something people feel about human psychology, probably mistakenly: that somehow, when we're with other people, we lose our rational capacity or personal identity, which controls our behavior."

Wednesday, November 9, 2011

Tips on Plotting with Grace

Grace is a nice program to make journal-quality graphs. A while ago, I blogged about how to use it to make inset plots.

If you are a regular user of Grace, the following tips (which I gathered from here and here) can improve your productivity:

1. Make a template plot:

You can make default settings by opening Grace, making your adjustments, and saving the file as Default.agr in

~/.grace/templates/Default.agr

If the .grace/templates folder doesn't exist, create it in your home directory.

I like to make my axes labels larger (usually fontsize 150) so that they are readable even when shrunk to fit a single column of a journal article. You can choose the font you like (I use Times-Bold).

In similar vein, I like my tickmarks and legends to be fontsize 125. I also like my symbols "filled".

Once you make a Default.agr, these settings are used anytime you open a blank Grace plot.

2. Default Printer:

I usually like to "print" ("export" in Grace) my graphs out in EPS format. While it would be nice to be able to do it in the Default.agr above, you cannot. However, you can create a file "~/.grace/gracerc.user" that simply contains the line

HARDCOPY DEVICE "EPS"

3. Font Tool:

Everytime you have to write a complex symbol in a textbox (while labeling an axis for example), you can press Ctrl + E, which opens up a Font ToolBox that lets you choose the symbol from a palette.

Short Cuts:
"\x a" produces "alpha" (\x is a proxy for \font{Symbol})
"\f{}" goes back to default font
"\2" is "\font{Times-Bold}"
"\S" is for superscript
"\s" is for subscript

Sunday, November 6, 2011

1. Roger Lowenstein on MF Global: I enjoyed his book "When Genius Failed: The Rise and Fall of Long-Term Capital Management", which informs this perspective.

2. The Divided Brain (RSA Animate)

3. Addiction is about the anticipation of reward: An interesting video. "Dopamine is about the pursuit of happiness." Watch it!

Friday, November 4, 2011

Bruce Lee

I was recently on a plane with a very entertaining colleague from Physics. He was talking with great animation (volume?) about some of his work, when a lady in front of us turned around and said with noticeable exasperation, "Sshhhh. I can't hear myself think!".

My friend quickly looked at her with feigned irritation, and said, "Think? Do you know what Bruce Lee said?".

"Don't think," he said, closing his eyes.

"Feel!", while he exhaled out a deep breath.

I couldn't help a chuckle.

Here's the (YouTube video) scene from "Enter the Dragon."

Wednesday, November 2, 2011

Two GNU Octave Tips

1. Changing directories

The standard command looks very much like the *nix "cd".

cd NameOfDirectory;

But if "NameOfDirectory" was stored in a variable (DirName  = "NameOfDirectory";), then trying something like,

cd DirName;

would fail because Octave would try to literally look for a directory named DirName. In *nix command lines one would circumvent this issue by "cd $DirName", but this does not work in Octave. The solution is to use parenthesis. So the following does the trick: cd (DirName); 2. Quotes: One can use the system command to issue instructions to the shell from within a Octave program. So to list the files in the present working directory one would say: system('ls'); However, if one wanted to use a Unix program or utility that itself uses quotes, then there are problems in parsing. So something like, system('awk '{print$0}' infile > outfile')

would not work. The solution is again quite simple. Use double quotes which cause the argument within the double quotes to be interpreted. Hence,

system("awk '{print $0}' infile > outfile") does what one would expect it to do. Friday, October 28, 2011 Weekly LinkFest 1. A "Spherical Cow" from Abstruse Goose 2. Best Statistics Question Ever (via FlowingData)? 3. CometDocs a free-online document converter (H/T the Simple Dollar). (Yeah! the PDF -> ODF conversion is not too shabby!) Tuesday, October 25, 2011 Passing parameters in Matlab or Octave: fzero and quad Consider a simple function f(x) = 2x written in Matlab/Octave (myfunc.m). function y = myfunc(x) y = 2*x end You could easily use the functions fzero and quad, to find a root of f(x) and the definite integral of f(x) between limits, say "a" and "b". You would say something like: fzero('myfunc',1.0) where 1.0 is the guessed root. The answer that Matlab finds is indeed zero. or, quad('myfunc',0.0,1.0) where a=0.0, and b=1.0 are the lower and upper limits. The quad function computes the correct answer (1.0). Let us say that we had a slightly more general function f(x) = c*x, where "c" is a parameter (c=2, in the previous case). That is, we have the Matlab/Octave function: function y = myfunc(x,c) y = c*x end Quite often we want to use fzero and quad on such a function for a particular setting of "c". That is we would like to be able to pass the parameter "c" to the function. There are several ways to doing this. Some of them (like making "c" a global variable are ugly). The one I find most convenient is the following: c=2; quad(@(x) myfunc(x,c), 0, 1.0) or, quad(@(x) myfunc(x,2.0), 0,1.0) fzero works in similar fashion. You could say: fzero(@(x) myfunc(x,2.0), 1.0) Just to reiterate, this method works in both Matlab and Octave, and you could easily generalize it to passing multiple parameters. Friday, October 21, 2011 Presentations using Beamer Finally. Finally, I decided to jump, and started using a LaTeX package called beamer to make presentations. I had been reluctant to make the transition because (i) like all non-GUI programs/packages, there is a learning curve, (ii) I liked the platform independence of OpenOffice Impress, which (iii) especially when combined with oooLaTeX let me display mathematical formulae with the cleanliness of LaTeX, and (iv) my presentations, unlike my documents don't have as many cross-references and citations. A part of me also liked the freeform nature of dropping images wherever I liked, and the ability to sketch up schematic diagrams on the fly. I gave beamer a test drive, learning from these tutorials. Since I knew LaTeX before, I was up and running in about two hours.It took me surprisingly less time than I had expected. In fact, I just presented slides made with beamer at the Society of Rheology's annual meeting in Cleveland. In the future, I expect to use this tool a lot more. Wednesday, October 19, 2011 Math Links 1. Two nice math videos (H/T Wild About Math) 2. The Cold Hit Problem (or are fingerprints unique?) Saturday, October 15, 2011 Another Legend Falls RIP Dennis Ritchie. As Brian Kernighan, a colleague at Bell Labs, wrote, "The tools that Dennis built — and their direct descendants — run pretty much everything today," As a "tribute", I tried to collect some of his quotes from the Internet (Disclaimer: I have not assessed their veracity) A language that doesn't have everything is actually easier to program in than some that do. I can't recall any difficulty in making the C language definition completely open - any discussion on the matter tended to mention languages whose inventors tried to keep tight control, and consequent ill fate. UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity. The only way to learn a new programming language is by writing programs in it. Friday, October 14, 2011 Profiling Fortran and C programs Given how easy it is to profile programs written in C and Fortran, I wonder why many people don't do it. Profiling lets you monitor the performance of your code. One way to do it is to use GPROF. Here is how you use it. 1. Compile your code (test.f90) with the -pg flag gfortran -pg test.f90 (or gcc -pg test.c) 2. Run the program a.out This creates a report gmon.out, which is unfortunately not in readable format. 3. To read it, say gprof a.out You will see a reasonably detailed report on number of function calls and time spent in different parts of the program. Friday, October 7, 2011 Floating Point Number Line In the class I am teaching this semester, I was talking about the "discreteness" of the floating point number line, and the errors that pop up due to finite precision. One of my students pointed out that he had heard that finite-precision was behind some of the initial failures of Patriot missiles. So I went back and researched and found that on Feb 25, 1991 an incoming Iraqi Scud missile killed 28 soldiers at Dharan because the Patriot missile that had been fired to intercept it failed. The source of the problem was that the onboard 24-bit computer measured time in integral units of 1/10 of a second (so 35, 36, and 37 instead of 3.5s, 3.6s and 3.7s), and the integer was multiplied by 1/10 on demand. The problem is 1/10 is a non-terminating sequence (0.0001100110011....) when translated in base-2. This is similar to 1/3 being non-terminating in base-10 (0.333...). When chopped off after 24 bits, the truncation error is about 1/10^7 s. While this may seem small, a battery operating for 100 hours on the missile would have accumulated a round-off error of 0.34 seconds. Given the speed of the Patriot missile (about 1700 m/s), it was off-target by about 600 meters. Interestingly, the Israeli's warned the US Army about the problem two weeks ago before the accident. Their solution: reboot frequently. While they did not know how this solved the problem, frequent resetting made the clock time smaller (than 100 hours), which meant smaller accumulated round-off error. Obviously, the problem could also have been avoided by using 1/8 or 1/16 as the unit of time, since these numbers can be represented exactly in base-2. Images: xkcd and Wired.com Tuesday, September 27, 2011 Summer Reunion This August we had a family reunion in the Outer Banks, NC. We (I lost count after two dozen) rented a beautiful beach house near Jennette's Pier in Nags Head for a week. It was a really memorable affair, and hopefully we will get around to doing it more often in the future. On one of the days we went to the Wright Brothers National Memorial, which is built on the site of their first successful controlled flight. Their story is truly awe inspiring, and it is hard not to come back from the Memorial without a sense of immense respect for their hard work and ingenuity. Interestingly, a week after we had been there, Hurricane Irene ripped through the Outer Banks. In fact, in one of the videos on Weather channel had the beach house we lived in, in the background. As someone remarked, we may have been the last people to have enjoyed the house. Sunday, September 25, 2011 Achievement Gap In this nearly hour-long video (via Bridging Differences), an assortment of panelists discuss the nature of the achievement gap between races. All of them agree that there are no simple solutions, which in itself is interesting since both Diane Ravitch and Michelle Rhee are on the panel. At one point, Ravitch reasserts her view that the original idea behind standardized testing was purely diagnostic. It was meant to be used like a thermometer is used to check temperature. Its widespread current use in penalizing or rewarding schools and teachers defeats that original intent. Comer backs her up by saying that a thermometer can tell whether a patient has a fever, but tells us nothing useful about what caused it, or how to fix it. In response, Rhee contends that once you find something amiss in the diagnosis, you do something about it, right? The idea that measurement and a corrective response to that measurement are completely independent of each other is misguided. If students under a particular teacher get low scores year after year, then at some point, one has to consider the hypothesis that the teacher needs to go. Angel Harris also points out that anecdote is not data. Just because a certain model has worked once somewhere doesn't prove that it is a successful model. You have to consider the entire distribution of outcomes under that model. A very interesting civil conversation. Thursday, September 22, 2011 Speeding up Matlab code Here is a nice video from Mathworks on different ways to speed up Matlab programs. It covers a bunch of techniques including preallocation and vectorization, which get you a great deal of bang for the buck. It is nearly an hour long. Saturday, September 17, 2011 How to write gzipped files from a C++ program? Every so often, you write a C++ program that generates a ton of output. There are cases when you want to write continuously to a compressed gzipped file from the C++ program (rather than using gzip to compress a large file after the program has finished running). One solution is to use the gzstream library. It "is a small C++ library, basically just a wrapper, that provides the functionality of the zlib C-library in a C++ iostream. It is freely available under the LGPL license." How do you actually do it? It is actually quite simple in a standard *nix system. 1. Download and unarchive the tarball into a folder gzstream. 2. Type "cd gzstream" and then "make" at the command prompt. It should make a library called "libgzstream.a" 3. Move the folder to an appropriate location if needed. 4. In the C++ program file: #include the headers gzstream.h, iostream, and fstream. 5. In your C++ program, say "ogzstream rpout("sigma.gz");", where rpout is the handle and sigma.gz is the filename that you want to write to. 6. Write to the file using somthing that may look like "rpout << setprecision(4) << sigma << endl;" 7. Close the file with "rpout.close();" 8. Finally to compile the C++ program say something like: "g++ program.cpp -I./gzstream -L./gzstream -lz -lgzstream". Here I assume that the gzstream directory is located in the same directory as the C++ program. If this is not true, then you have to change the -I and -L location tags. 9. You are now good to go, with "a.out". Thursday, September 15, 2011 Math Links: 1) A history of Bayes Theorem: I always find articles that put faces and history behind theories and formulae interesting. As you might notice from the comments, statisticians are argumentative :) 2) Why math software cannot be used mindlessly: We are reminded yet again why results from Matlab or Mathematica need to be understood. Tuesday, September 13, 2011 Linux Mint You may have seen this xkcd comic before. I guess that can only mean one thing. The year of Linux Desktop is here! After a fairly long hiatus, I tried out a new Linux distribution (new to me) called Linux Mint which has recently surged in popularity. It appears (from less than scientific sources) to be about three times more popular than other distributions with name recognition (Fedora and OpenSuSE), although it still trails Ubuntu quite a bit. Interestingly, Linux Mint is based on Ubuntu, which in turn is based on Debian. My first few impressions. It worked just great on the six year old laptop that was lying around my house. Installation was a breeze, and since it downloads audio and video codecs, it worked "out of the box". The interface is clean and friendly. Given improvements in LibreOffice and the grain of truth in the comic above, I think it might finally be time to wean my parents off of their heavily infested Windows installation. Sunday, September 11, 2011 Ten years since 9/11/2001 A lot of people have very vivid memories of where they were, and what they were doing when the Twin Towers went down. Compared to most of my memory, which is a sequence of low-resolution bitmap images - colored and "photoshopped" by biases and time - I recollect 9/11 in HD video. One of lab-mates was defending his PhD at 9am. Normally, like the majority of grad students at Michigan, I was somewhat nocturnal. The only reason I streamed into the department around 8:30 was to watch my friend defend. From the moment I entered the department, things looked eerily different. There was a small TV in the conference room, which probably would look more at home alongside the other relics encased in a nearby glass exhibit, that was turned on. All the faculty and most of the staff seemed to be glued to the screen. It was mildly disorienting to observe a familiar environment cast in an unfamiliar light. Despite the general confusion regarding what was happening, the defense went on as scheduled. You did not need a PhD to know what was on the audience's mind. These were pre-smartphone days. As soon as the talk was over, one of the faculty ran outside to check what was happening. He returned in about five minutes with the news that one of the towers had collapsed, and that another plane had struck the second tower. People gasped. A few short minutes later, school was called off, and we were sent home. History sometimes happens, when you least expect it. Saturday, September 10, 2011 Grade Inflation Links 1. A nice Infographic: A student from my department (Ian) pointed out that while the graphic is nice, the "embed me code" seems to tailored to improve its Google PageRank. At some point they may start advertizing links to online colleges and collect a fair amount of money! 2. A treasure trove of data and analysis (check out some of the links) on grade inflation. Tuesday, August 30, 2011 Economics: A love affair Bill Gross of Pimco uses the metaphor of love, marriage, and divorce to describe disturbing financial events in Europe, United States, and the rest of the world. An entertaining take on a sobering state of affairs. Oh those feisty Europeans! Always fighting like a dating couple and then finally resolving their differences by saying “I do” sometime in the 1950s with the creation of the Common Market and the European Economic Community (EEC). In doing so, France and Germany said “never again,” and even though they didn’t like each other (read “hate”) they decided to make economic lurv in the hopes that they wouldn’t destroy the continent again. It later turned into a formal union, a European Community (EC), where they invited lots of witnesses to the ceremony and created instant family members, if that’s metaphorically possible. Twenty-seven of them, including Italy, Spain and the U.K. were now relatives despite some liking pasta and others preferring horrid cuisines featuring Shepherd’s Pie or fish and chips. The marriage progressed to the point of a smaller monetary union sometime in 1999, but critically, without a common budget. Husband and Wife – Germany and Greece – decided to have a joint bank account, but with separate allowances and no oversight. Greece could issue bonds at nearly the same yield as could its Northern hard-working neighbors, but were free to spend it any way they chose. This was an economic version of an open marriage where one party gets to have all the fun and the other worked nine-to-five and came home too exhausted for whoopee. Sunday, August 21, 2011 Bayesian Quote "a Bayesian is someone who, vaguely expecting a horse, and glimpsing the tail of a donkey, concludes he has probably seen a mule" - John Hussman Saturday, August 20, 2011 Compound Interest You may have heard of the famous rule of 72 (about compound interest). If an investment returns X% annually, then the number of years required to double the principal is 72/X. So if you get 7% on something, it will take you approximately 72/7 = 10 years to double your initial investment. A friend asked me yesterday what the origin of this was, and I figured it wouldn't be hard to discover what someone had already invented. Consider the standard compound interest formula with compounding rate represented x=X/100 (in decimal rather than %), final amount = initial amount * (1+x)^n, where ^ denotes a power operation. Since we are looking for a double (final amount = 2 * initial amount), we can take the logarithm of both sides of the equation as: log 2 = n * log (1+x) For small values of x (compared to 1), log(1+x) is approximately x (from Taylor series expansion). Hence, n = log(2)/x = 0.693/x = 69.3/X We probably use 72 instead of 69.3 because it has a large number of factors. Friday, July 29, 2011 Projection Giridhar Madras on his blog commented on this report. To be perfectly honest, I did not read the entire original report, for two reasons: (i) I don't like artificial boxes, and (ii) I don't like artificial boxes whose labels and contents are not consistent. As soon as I realized this "study" suffered from both these afflictions, I figured I had better things to do. Let me take reason (ii) first. Teaching load alone is a terrible metric to measure anything other than teaching load, and even there it is an uneven measure. It is harder to teach engineering design to a small class than an introductory class to a large freshman class. In the same way, research dollars are a pathetically inadequate way to sniff out true "pioneers". Not everything that can be measured is of value, and not everything of value can be measured. I seriously shudder at the prospect of such studies being taken seriously. I have never been a big fan of such simple-minded measures. I gather this fascination has something to do with out inability to grope with multidimensional complexity. We try to project a complex high-dimensional space onto a simple scalar. We like scalars because we can intuitively compare two scalars. We can order them, plot them on graphs, and run statistics on them with ease. Unfortunately, the rules of projection are often arbitrary (like this study), and the resulting scalar is of marginal value. The trouble is that they get taken seriously. This disease is everywhere. Using academic rankings to choose a university, using IQ to measure intelligence, using impact factor to measure journals etc. Monday, July 25, 2011 Links: 1. Khan Academy: Wired has an article on the implications of Khan academy for the traditional classroom. Initially, Thordarson thought Khan Academy would merely be a helpful supplement to her normal instruction. But it quickly become far more than that. She’s now on her way to “flipping” the way her class works. This involves replacing some of her lectures with Khan’s videos, which students can watch at home. Then, in class, they focus on working problem sets. The idea is to invert the normal rhythms of school, so that lectures are viewed on the kids’ own time and homework is done at school. It sounds weird, Thordarson admits, but this flipping makes sense when you think about it. It’s when they’re doing homework that students are really grappling with a subject and are most likely to need someone to talk to. 2. The Engineer Guy: What Khan academy tries to do with high-school topics, this site tries to do with engineering/technology topics. Monday, July 18, 2011 Is higher education a bad value proposition? Two recent "finance/econ" type articles seem to take opposite sides in this debate. On the one hand Vikram Manasharmani argues that higher education exhibits all the tell-tale signs of a classic bubble (it is useful to keep the recent US housing crisis looming in the background). These include, among others (a) an unquestioning faith in the "assets" value, (b) availability of easy credit to buy the asset, and (c) increasing participation of value-insensitive buyers. The net result has been that the price of higher education has outstripped inflation in recent years by more than 5% at public institutions. The total student loan debt is apparently on track to beat the total credit card debt this year. The other side of the debate comes from unemployment statistics. Saj Karsan presents a chart which breaks down unemployment numbers by education level. It is nearly 15% for people without a high-school diploma, and about 4.4% for people with bachelors degree. As he notes: Note that the overall unemployment rate in 2007, when the American economy was booming, was 4.6%, which is higher than the current unemployment rate of 4.4% for those with Bachelor's degrees. This data suggests (although it does not prove) that there is a shortage of educated workers in the US. Thursday, July 14, 2011 Printing Webpages Many websites offer printer-friendly versions of documents (Google Maps for instance). When they don't, printing can be a messy affair, with blank pages, useless ads, and corrupted formatting. Recently, I found two nice browser-based utilities that make life a little easier. 1. PrintFriendly: This is simple tool, without many bells and whistles. You simply enter the web address, and it returns a simplified, printer friendly version. You can then save it as a pdf or directly print it. It allows you to make minimal changes, such as deleting some items, before you choose to save or print it. 2. Printwhatyoulike: This site supposedly offers you more control over which items you choose to display. There is also a utility which lets you "zip-pages" (combine long articles which require you to press "next" 10 times) into a single document. I tried to use this a little, but my results weren't all that great. Tuesday, July 12, 2011 Orthogonal Polynomials: Mathematica Orthogonal polynomials are everywhere. The following Mathematica program takes in a weight function (as a function of x), the domain (a and b), and spits out the first "n" corresponding orthonormal polynomials. OrthoPoly[w_, a_, b_, n_] := Module[{monoBasis, T}, monoBasis = Table[{x^i}, {i, 0, n - 1}]; oP = monoBasis; oP[[1]] = oP[[1]]/Sqrt[Integrate[w*oP[[1]]*oP[[1]], {x, a, b}]]; For[i = 2, i <= n, i = i + 1, For[T = 0; j = 1, j < i, j = j + 1, T = T + Integrate[w*oP[[i]]*oP[[j]], {x, a, b}]*oP[[j]]; ]; oP[[i]] = oP[[i]] - T; oP[[i]] = oP[[i]]/Sqrt[Integrate[w*oP[[i]]*oP[[i]], {x, a, b}]] // Simplify; ] ; oP ] Here's a screenshot, for the first four Chebyshev and Legendre polynomials (click to enlarge). Note that these polynomials are unique up to a multiplicative constant. Orthonormality freezes that constant. Thursday, July 7, 2011 Gary Taubes: Why we get fat In this YouTube video, Gary Taubes of the "Why we get fat" fame, delivers a hour and half long lecture on Google's campus. Somewhere in the first five minutes, Gary Taubes says that this talk is a Cliff Notes version of the book (which apparently is a Cliff Notes version of his more fully fleshed book "Good Calories, Bad Calories"). As if all this was not enough, here's a Cliff Notes version of the talk: Taubes compiles historical data, and argues that casting fat/weight reduction into a "Eat less, Exercise more" regimen, misses the point. I like the example, where he says that, that is exactly what you would do, if you had to work up an appetite. The primary claims of Taubes' critics seems to be that (i) he edits conversations with experts (many of his interviewees seem to have "recanted"), and (ii) he commits a crime of omission by not engaging the large body of research which contradicts his claims. Essentially he upsets the implied causal connection between weight loss and negative energy balance, as implied by "accumulation = input - output", by suggesting that the "=" does not tell us what causes what. The culprit is apparently insulin, which is released when carbohydrates are consumed. Insulin encourages fat cells to store fat. Mice that are injected with insulin, and then underfed, tend to become obese. The message is therefore to avoid carbohydrates, and actually consume fat (like Atkins diet). It is a compelling point of view, but apparently still quite controversial. Here is a story countering Taubes' initial article in NYT. Here is Taubes' response. Here is the response to the response. Monday, July 4, 2011 Links: July 4th edition 1. James Altucher on how we can get rid of Congress, and replace the republic with a true democracy. Beneath the cloak of irreverence are a few potentially interesting ideas. 2. BugMeNot: The subtitle "Bypass Compulsory Registration" says it all. Essentially allows you to find or share login/passwords for sites that force you to register before reading on (via Simple Dollar). 3. A portrait of the irresistible Jon Stewart of "The Daily Show" Friday, July 1, 2011 Why we have college. Louis Menand in the New Yorker examines the issue. Soon after I started teaching there (a public school), someone raised his hand and asked, about a text I had assigned, “Why did we have to buy this book? I got the question in that form only once, but I heard it a number of times in the unmonetized form of “Why did we have to read this book?” I could see that this was not only a perfectly legitimate question; it was a very interesting question. The students were asking me to justify the return on investment in a college education. I just had never been called upon to think about this before. It wasn’t part of my training. We took the value of the business we were in for granted. Is the role of higher education to sort students according to intelligence, skill or merit, or is it to ensure that everyone has access to knowledge and the goodies that accompany it? As he argues: A lot of confusion is caused by the fact that since 1945 American higher education has been committed to both theories. The system is designed to be both meritocratic (Theory 1) and democratic (Theory 2). Professional schools and employers depend on colleges to sort out each cohort as it passes into the workforce, and elected officials talk about the importance of college for everyone. We want higher education to be available to all Americans, but we also want people to deserve the grades they receive. And one of the many facts that I did not know In 1940, the acceptance rate at Harvard was eighty-five per cent. By 1970, it was twenty per cent. Last year, thirty-five thousand students applied to Harvard, and the acceptance rate was six per cent. ... Columbia, Yale, and Stanford admitted less than eight per cent of their applicants. This degree of selectivity is radical. To put it in some perspective: the acceptance rate at Cambridge is twenty-one per cent, and at Oxford eighteen per cent. It is an interesting read. Wednesday, June 22, 2011 Links: Video Edition I sense a common thread through the following "three" video links. Do you? 1. RSAAnimate: Series of videos (including Ted Robinson's "education" video) using free form comics to explain important ideas. 2. ChromeBook: Or essentially any other "Google" video. 3. Subversion: Poor man's version, yet very effective! I wish I could do stuff like that :) Saturday, June 18, 2011 Linux: Forcing cp to overwrite As a precaution, I have the following three lines in my .bashrc file. # SAFETY ALIASES alias rm="rm -i" alias mv="mv -i" alias cp="cp -i" When I try to move or copy something onto a file that already exists it gives me a warning prompt. So far, so good. Sometimes, I intentionally want to overwrite a bunch of files. With mv, I just say something to the effect of mv -f dir1/*.dat . to move all the *.dat files from dir1 into the current working directory. Unfortunately cp -f dir1/*.dat . does not work. A trick is to use the command "yes". So yes | cp -f dir1/*.dat . seems to fix the problem. Thursday, June 16, 2011 Contour Length of a Gaussian Bead-Spring Chain Consider a Gaussian bead-spring chain with N springs of mean-squared length b*b. The distribution of the end-to-end vector is the standard Gaussian distribution: With well known properties: and, However, note that, which means that average spring length (N=1) is smaller b. Therefore the average contour length is not Nb. Friday, June 10, 2011 Links 1. In defense of inefficient code: Mike Croucher makes a case for working-but-not-particularly-efficient code written by practitioners in high-level languages (Matlab, Mathematica, python). As he puts it: It comes down to this. CPU time is cheap. Very cheap. Human time, particularly specialised human time, is expensive. and also: In my opinion, high level programming languages such as Mathematica, MATLAB and Python have democratised scientific programming. Now, almost anyone who can think logically can turn their scientific ideas into working code. I’ve seen people who have had no formal programming training at all whip up models, get results and move on with their research. Let’s be clear here – It’s results that matter not how you coded them. I often use these high-level languages to rapidly prototype new ideas. If and when required, it can always be translated into C++ or Fortran. Also from a practical perspective inefficient code that runs 10 times slower than highly optimized code is acceptable, if it still takes only 10 seconds to run. 2. Geeky jokes on Tanya Khovanova's Blog: I like this one: I just learned that 4,416,237 people got married in the US in 2010. Not to nitpick, but shouldn’t it be an even number? Obviously, you can understand the audience she caters to from this comment. Monday, June 6, 2011 Mathematica 8 on sale at Amazon In preparation for a course I am teaching over Fall, I have been learning/following Mathematica quite closely. I found out last week that Amazon has a sale on "Mathematica 8 Home Edition" for$239.

I've never used Mathematica extensively in the past, for three reasons:
• the price sucks ($1000+ bucks, if I try to buy it directly from Wolfram). • I do not really need it for my research, • my department has a site license for teaching, if I need to use it. But it is an amazing piece of software. I always suspected that it was. Now that I am learning how to use it better, I am finding that it is even more amazing than I thought it was. Under different circumstances (if none of the above three reasons were valid, for example), I would probably buy it. Wednesday, June 1, 2011 Why numerical differentiation may be trickier than you think? Here is a link to a fascinating presentation by Harvey Stein ("Risky Measures of Risk: Error Analysis of Numerical Differentiation"). He makes a very "visual" case for why one needs to think carefully before using large (convexity error) or small step sizes (cancellation error). Friday, May 27, 2011 Luck or Skill? I have been re-reading Michael Mauboussin's "Untangling skill and Luck" (pdf) over the past week. One of my favorite parts is where he discusses the relative composition of luck and skill in a particular activity. There’s a simple and elegant test of whether there is skill in an activity: ask whether you can lose on purpose. If you can’t lose on purpose, or if it’s really hard, luck likely dominates that activity. If it’s easy to lose on purpose, skill is more important. The quote is attributed to Annie Duke's 2007 testimony on behalf of Poker Player's alliance. Although, I did not find the exact source, the testimony is itself a fascinating read. At one point she tries to distance poker from other forms of gambling: There is critical distinction between poker and other forms of “gambling” which is the skill level involved to succeed at the game. I cannot stress this point enough: in poker it is better to be skillful than lucky. I ask anyone in this hearing room to name for me the top five professional roulette players in the world or the number one lottery picker in America. It is just not possible (my apologies to one obvious candidate, Congressman Sensenbrenner). We can however have a real discussion about the top five professional poker players, just like we can have a discussion about the top five professional golfers. Wednesday, May 25, 2011 Gilbert Strang: Video Lectures Gilbert Strang is an amazing teacher. This summer, I've been watching some of his lecture videos on Linear Algebra, and Computational Science and Engineering, at MIT OpenCourseWare. Although, they are not technologically fancy or gimmicky, they provide a superb introduction. Here are short-cut links to: 1. Linear Algebra: I was still an undergrad when these lectures were filmed. As a matter of fact, I was taking a linear algebra course, at around the same time. 2. Computational Science and Engineering: Given my current academic home, this is a nice introductory series. Friday, May 20, 2011 Numerical Differentiation using Mathematica I am teaching myself Mathematica over summer, and coded up a simple module that computes numerical differentiation formulas automatically. Essentially, the module enables one to compute arbitrary differentiation rules like those explained here. This module spits out the different formulae for a m-point approximation to the n-th derivative, and the leading error term. Here is the (updated) module: AppDeriv[m_, n_] := Module[{points, IP, nthDeriv, e1}, points = Table[{x0 + i h, Subscript[f, i]}, {i, 0, m - 1}]; IP = InterpolatingPolynomial[points, x]; nthDeriv = D[IP, {x, n}]; e1 = (1/Factorial[m])* D[Apply[Times, (x - Table[{x0 + i h}, {i, 0, m - 1}])], {x, n}]; Formula = TableForm[ Table[{Subsuperscript[f, i, n], nthDeriv /. x -> (x0 + i h), Superscript[f, m] e1 /. x -> (x0 + i h)}, {i, 0, m - 1}] // Simplify ]] This seems to work alright for me (click to enlarge). Saturday, May 14, 2011 Test-taking success and "real" success Interesting article in the New York Magazine: Paper Tigers: What happens to all the Asian-American overachievers when the test-taking ends. From the article: Entrance to Stuyvesant, one of the most competitive public high schools in the country, is determined solely by performance on a test: The top 3.7 percent of all New York City students who take the Specialized High Schools Admissions Test hoping to go to Stuyvesant are accepted. There are no set-asides for the underprivileged or, conversely, for alumni or other privileged groups. There is no formula to encourage “diversity” or any nebulous concept of “well-­roundedness” or “character.” Here we have something like pure meritocracy. This is what it looks like: Asian-­Americans, who make up 12.6 percent of New York City, make up 72 percent of the high school. This year, 569 Asian-Americans scored high enough to earn a slot at Stuyvesant, along with 179 whites, 13 Hispanics, and 12 blacks. Such dramatic overrepresentation, and what it may be read to imply about the intelligence of different groups of New Yorkers, has a way of making people uneasy. But intrinsic intelligence, of course, is precisely what Asians don’t believe in. They believe—and have ­proved—that the constant practice of test-taking will improve the scores of whoever commits to it. All throughout Flushing, as well as in Bayside, one can find “cram schools,” or storefront academies, that drill students in test preparation after school, on weekends, and during summer break. “Learning math is not about learning math,” an instructor at one called Ivy Prep was quoted in the New York Times as saying. “It’s about weightlifting. You are pumping the iron of math.” Mao puts it more specifically: “You learn quite simply to nail any standardized test you take. ... And yet the numbers tell a different story. According to a recent study, Asian-­Americans represent roughly 5 percent of the population but only 0.3 percent of corporate officers, less than 1 percent of corporate board members, and around 2 percent of college presidents. There are nine Asian-American CEOs in the Fortune 500. In specific fields where Asian-Americans are heavily represented, there is a similar asymmetry. A third of all software engineers in Silicon Valley are Asian, and yet they make up only 6 percent of board members and about 10 percent of corporate officers of the Bay Area’s 25 largest companies. At the National Institutes of Health, where 21.5 percent of tenure-track scientists are Asians, only 4.7 percent of the lab or branch directors are, according to a study conducted in 2005. ... “The loudest duck gets shot” is a Chinese proverb. “The nail that sticks out gets hammered down” is a Japanese one. Its Western correlative: “The squeaky wheel gets the grease.” Tuesday, May 10, 2011 Diane Ravitch I briefly heard Diane Ravitch on NPR, during a recent commute. She commented (I paraphrase) that standardized measuring/testing using was intended as a diagnostic tool. You find out what is lacking, and you take constructive corrective action. Unfortunately it has morphed into an punitive assessment tool, which seeks to punish a negative diagnosis. Here is a link to a bunch of interesting articles by her. In one of them (Ravitch answers Gates): Gates: "Does she think all those ‘dropout factories’ are lonely?" Ravitch: "This may come as a surprise to Bill Gates, but the schools he refers to as "dropout factories" enroll large numbers of high-need students. Many of them don't speak or read English; many of them enter high school three and four grade levels behind. He assumes the schools created the problems the students have; but in many cases, the schools he calls "dropout factories" are filled with heroic teachers and administrators trying their best to help kids who have massive learning problems. "Unless someone from the district or the state actually goes into the schools and does a diagnostic evaluation, it is unfair to stigmatize the schools with the largest numbers of students who are English-language learners, special-education, and far behind in their learning. That's like saying that an oncologist is not as good a doctor as a dermatologist because so many of his patients die. Mr. Gates, first establish the risk factor before throwing around the labels and closing down schools." Friday, May 6, 2011 Republican Presidential Debate Last night, I had trouble going to bed, and ended up watching almost the entire Republican presidential primary debate. While many supposed heavy-weights were absent, the actual debate was an interesting presentation of the potential diversity in Republican thought. There was Tim Pawlenty, a front runner; Herman Cain, the African-American pizza-guy; Rick Santorum, the social conservative; and Ron Paul and Gary Johnson, the libertarians. Not unsurprisingly, I found Ron Paul and Gary Johnson the most interesting candidates. As a theorist, I find their uniform small government stance on both fiscal and social issues, ummm, how do I put it, more self-consistent?. For example, when asked about gay marriage or prostitution, Ron Paul said a small federal government had no business regulating social aspects of life. Governments should not tell people how to live. I wonder how such positions will play with Republican primary voters. Wednesday, May 4, 2011 How to convert color EPS figures to grayscale? Some journals still will not let you use color figures without extracting a ransom. This is a problem because other than print versions of journal articles, all other fora (presentations, webpages, posters, online journal articles) in which figures are presented, are color-agnostic. In fact, there it is often desirable to present things in color. I recently found a GPLed perl tool pscol which lets you quickly make grayscale versions of color EPS figures. You can easily weave it into a shell script to make grayscale versions of an entire directory of color EPS figures, very quickly. The usage is very simple: pscol [flags] infilename outfilename Allowed flags are: -h print this message and exit. -gray convert RGB colorscale to grayscale. -0gray convert RGB colorscale to grayscale (simple). -cmyk convert RGB to CMYK. For simple files the results are decent, for example: becomes Wednesday, April 27, 2011 Exascale Computing Recently, I heard David Keyes talk about exascale computing ("Exaflop/s Seriously!"). There were two interesting lessons. 1. In one of his viewgraphs (which I unfortunately do not have access to) he plotted the peak performance of the top supercomputer in the Top500, versus time. The resulting curve looks familiar (Moore's law), and when drawn on a log-linear scale looks approximately linear. I reproduce a similar looking curve (but this one combines all the 500 top machines) below. The interesting part of the viewgraph was that, on the same plot, he also threw in the performance of the 500th best machine in the Top500. This was also linearly increasing (on a log linear plot) curve that lay somewhat lower than the fastest computer. The surprising bit (for me at least) was that the "phase shift" in terms of time was about 8 years. That is, if the top machine in the Top500 was left alone, in 8 years it would be overtaken by everybody else in the elite group. This can have policy implications for HPC centers. Rather than trying to build the fastest computer now, perhaps, having a long-term plan to keep upgrading the system is more important. 2. We've seen CPU clock cycles stagnate. For HPC centers, which use thousand of them, energy considerations can quickly become the dominant concern. This explains the shift to multicore. I learned that energy scales roughly as the third power of the frequency. From here, for example "If the clock rate of the Multi-Core CPUs will be reduced by 20% only, then the energy consumption is reduced to 50% compared to a system running at full clock speed.” On the other hand, for multicore CPUs the energy is only proportional to the first power of the number of cores, explaining the move towards multicore, low frequency supercomputers. I don't know if the numbers here include the cost of air-conditioning (I presume they do), but they can be a significant operating cost. I remember that we had a small Beowulf cluster (puny by today's standards), when I was in grad school, that was housed in a student office for sometime. The room was unbearable in summer. Monday, April 25, 2011 Links: 1. Why I "prefer" cheap wine. 2. Why "this is fun" may be a better password than "J4FS<2". 3. Bike paths in Tallahassee (Google Maps) - clicking on route activates a small description. Wednesday, April 20, 2011 Scott Adams on Scott Adams I linked to this article by Scott Adams in the last post. Something very interesting seems to have developed since then (H/T nanopolitan). Essentially, Adams used an alias (plannedchaos) to defend Scott Adams on MetaFilter. An extremely interesting (or lucky?) thing was that right after his first rant, sombody seems to have sniffed him out. Friday, April 15, 2011 Scott Adams on Real Education Interesting perspective from the creator of Dilbert. He begins with: I understand why the top students in America study physics, chemistry, calculus and classic literature. The kids in this brainy group are the future professors, scientists, thinkers and engineers who will propel civilization forward. But why do we make B students sit through these same classes? That's like trying to train your cat to do your taxes—a waste of time and money. Wouldn't it make more sense to teach B students something useful, like entrepreneurship? And ends with: Remember, children are our future, and the majority of them are B students. If that doesn't scare you, it probably should. In many ways, I think it is a critique of standardized testing (which automatically fosters standardized education). Linear Least Squares in Octave While Octave functions ols and leasqr are good for heavy-lifting, polyfit is often sufficient for simple linear regressions. Given arrays of data x and y: c = polyfit(x,y,1) gives the regression yfit = c(1) * x + c(2). Sunday, April 10, 2011 Links: Unhurried Nonfiction I found Longform.org today, and can foresee spending quite a bit of time here in the future. It works somewhat like slashdot.org. Here the description from the website. Longform.org posts new and classic non-fiction articles, curated from across the web, that are too long and too interesting to be read on a web browser We recommend enjoying them using read later services like Instapaper and Read It Later and feature buttons to save articles with one click. You can even subscribe to a feed via RSS. Saturday, April 2, 2011 What's your Nobel number? We like quantifying things. We design impact factors, and h-indices to determine the scientific worth of a journal, or a researcher. Sometimes these numbers are meaningful. Sometimes they are gamed. Sometimes they are misused. My pet peeve against these measures (and their extended family) is that they are designed to measure popularity. Unfortunately, they are commonly used as a proxy for scientific quality. Can high-quality stuff be popular? You bet. But the relationship between quality and popularity is tenuous at best. High-quality stuff can stay under the radar for prolonged periods, and flashy low-quality stuff can go platinum. We intuitively understand and appreciate this difference when we judge music, movies, politics, or literature. Love them or hate them, we probably have to learn to live with them. There is another class of numbers, like the Erdos number, that exist for pure entertainment and tongue-in-cheek bragging value. They measure proximity to greatness, and are related to the six-degrees of separation idea. Let us define a new number (maybe it already exists) which we shall call the "Nobel number" which measures the shortest "collaborative distance" between a scientist and a Nobel Laureate as measured by authorship in scientific literature. Thus, if you have co-authored a paper with a Nobel Laureate, your Nobel number is 1. Mine is 2. PS: I saw Energy Secretary Steven Chu on CSPAN yesterday. He is a Nobel Laureate and co-authored a paper with my PhD advisor Ron Larson. Tuesday, March 29, 2011 Polygonal kitchen sinks I heard about this study in Nature (subscription required) in a recent talk. See some pictures here. When a water jet strikes a flat surface at high Reynolds number, it normally creates a circular hydraulic jump. In the paper linked above, Ellegaard et al. demonstrate that stationary polygonal patterns can be formed (instead of circles) when high viscosity fluids are used. Fascinating. Thursday, March 24, 2011 "Visual" Links 1. Radiation Dose Chart: xkcd presents an illuminating graphic. We often forget that we are immersed in radiation. 2. Gallon to Gallon: Stuff that is more expensive than gasolene. It reminded me of an observation a friend made, when he first came to the US from India: "It's funny how the price of water is sometimes more than milk, which itself is sometimes more than gasolene, and peanuts and cashews cost about the same!" (via FlowingData). There is also an interesting chart at FlowingData showing the geographic distribution of gasolene prices. 3. Spectral Mesh Compression (pdf lecture): Interesting things happen, when you transform them (via Mathematical Poetics). Friday, March 18, 2011 Finance Links: Skill versus Luck Two interesting articles: 1. Luck versus skill: How can you tell? Aswath Damodaran on why it is much harder to separate the two in finance than in other fields like sports. 2. Untangling skill and luck (pdf). The introduction to this piece by Michael Mauboussin itself is enough to make you want to read on. For almost two centuries, Spain has hosted an enormously popular Christmas lottery. Based on payout, it is the biggest lottery in the world and nearly all Spaniards play. In the mid 1970s, a man sought a ticket with the last two digits ending in 48. He found a ticket, bought it, and then won the lottery. When asked why he was so intent on finding that number, he replied, "I dreamed of the number seven for seven straight nights. And 7 times 7 is 48." Monday, March 14, 2011 How to place a tight "bounding box" around an EPS image? Recently, I received an EPS file with extra white space around the actual figure. There are a number of ways of trimming the extra white space, including opening up the EPS file in a text editor and manually redefining the bounding box (this is simpler than it sounds, especially once you've done it a couple of times). However, if you run Linux, and have ghostscript installed (by default on most distributions) you can use the eps2eps wrapper: eps2eps FileWithWhiteSpace.eps FileTrimmedWhiteSpace.eps Simple. Thursday, March 10, 2011 Domain-based overconfidence I was reading a financial blog, that I peruse occasionally. In a recent post, there were two "puzzles". Usually, this is enough to stop me from passing on. 1. The Dow Jones Price Index does not include the effects of dividend re-investment. If dividends had been considered re-invested in the index since its inception in 1896, at what price level would the index be at today? Provide a 90% confidence interval around your answer (i.e. you are 90% confident that your interval includes the right answer). 2. There are 100 bags, each containing 1000 poker chips. 45 bags have 700 black chips and 300 red chips, while 55 bags have 700 red chips and 300 black chips. If you select a bag, what is the probability that most of the chips are black? If you pulled out 12 chips from that bag, and 8 of them are black and 4 of them are red, now what is the probability that most of the chips in the bag are black? I have seen questions of this type before (a short rant, later), and so I was not caught off-guard. (Sidenote: Puzzles and jokes aren't quite the same when you hear them the second time) The first question I answered 1 trillion (actual answer is 650,000, the DJIA is currently around 12,000), and the second one I said 0.45 and 0.96 (thanks to Bayes theorem), which are the correct answers. Typically, people guess much smaller than 650,000 for Q1, and for the second part of Q2, people typically guess between 45-75%. To be perfectly honest, for Q1, I had two numbers in mind. I thought of 1 trillion as the "correct answer" (based on having seen the type of questions before), and a much lower (and wrong) 100,000 as a plausible tight upper-bound. So in some sense I flunked Q1. Here's my rationalization. The disturbing part is the "90% confidence", which probably implies that if one was asked such questions 10 times, one should flunk once on average. The problem with gross overestimations (like 1 trillion) are that the odds of getting that occasional (required) wrong answer diminish. Monday, March 7, 2011 CiteULike: Export and clean BiBTeX file As mentioned in a previous post, CiteULike lets you export citations in BiBTeX format, which is useful for including it in LaTeX documents. However, the BiBTeX entries it produces, contain a lot of metadata that I like to filter out. One could do this manually, which is fine, but many of the cleaning operations are routine. As any self-respecting Linux user would say, it would be nice if one could automate at least part of the clean up process. For example, a typical BiBTeX entry that CiteULike produces looks like: @article{newman01, abstract = {{We describe in detail an efficient algorithm for studying site or bond percolation on any lattice. The algorithm can measure an observable quantity in a percolation system for all values of the site or bond occupation probability from zero to one in an amount of time that scales linearly with the size of the system. We demonstrate our algorithm by using it to investigate a number of issues in percolation theory, including the position of the percolation transition for site percolation on the square lattice, the stretched exponential behavior of spanning probabilities away from the critical point, and the size of the giant component for site percolation on random graphs.}}, archivePrefix = {arXiv}, author = {Newman, M. E. J. and Ziff, R. M.}, citeulike-article-id = {3373958}, citeulike-linkout-0 = {http://arxiv.org/abs/cond-mat/0101295}, citeulike-linkout-1 = {http://arxiv.org/pdf/cond-mat/0101295}, citeulike-linkout-2 = {http://dx.doi.org/10.1103/PhysRevE.64.016706}, citeulike-linkout-3 = {http://link.aps.org/abstract/PRE/v64/i1/e016706}, citeulike-linkout-4 = {http://link.aps.org/pdf/PRE/v64/i1/e016706}, day = {8}, doi = {10.1103/PhysRevE.64.016706}, eprint = {cond-mat/0101295}, journal = {Physical Review E}, keywords = {cluster, fast, percolation}, month = {Jun}, number = {1}, pages = {016706+}, posted-at = {2011-01-12 21:22:08}, priority = {2}, publisher = {American Physical Society}, title = {{Fast Monte Carlo algorithm for site or bond percolation}}, url = {http://dx.doi.org/10.1103/PhysRevE.64.016706}, volume = {64}, year = {2001} } Typically, I like to get rid of all the "citeulike" tags, and irrelevant metadata such as priority, abstract etc. Additionally, I like to abbreviate journal titles, and replace author names with initials if necessary. That is quite a bit. I wrote a quick and dirty "sed" script, called "clean_citeulike" as below: s/Physical Review/Phys. Rev./ s/Journal of Rheology/J. Rheol./ s/Rheologica Acta/Rheol. Acta/ s/The Journal of Chemical Physics/J. Chem. Phys./ s/Computer Physics Communications/Comp. Phys. Comm./ s/Macromolecular Theory Simulation/Macromol. Theory Simul./ /author/ s/, $[A-Z]$[a-z]* /, \1. /g /author/ s/, $[A-Z]$[a-z]*}/, \1.}/g /citeulike/ d /keywords/ d /posted-at/ d /priority/ d /publisher/ d /abstract/ d /month/ d /url/ d /day/ d /issn/ d Now, all I need to do is run the program on the "bib" file (which I call in.bib) according to$ sed -f clean_citeulike in.bib

and I get something that looks like:

@article{newman01,
archivePrefix = {arXiv},
author = {Newman, M. E. J. and Ziff, R. M.},
doi = {10.1103/PhysRevE.64.016706},
eprint = {cond-mat/0101295},
journal = {Phys. Rev. E},
number = {1},
pages = {016706+},
title = {{Fast Monte Carlo algorithm for site or bond percolation}},
volume = {64},
year = {2001}
}

Not perfect perhaps, but much simpler to clean manually.