Clueless Fundatma: July 2014

Thursday, July 31, 2014

Links

1. Vax: Understanding the spread of infectious diseases via networks, and the role of tools such as vaccination and quarantine. (via Flowing Data)

Given the recent outbreak of pertussis, and yet another systematic review debunking the autism-vaccination link as profiled here, perhaps this game will help anti-vaccination folks understand how we, like wolves, derive our strength from the pack.

Also gotta check out Samantha Bee's piece (video) for the Daily Show.

2. At a recent conference somebody brought up Stigler's law of eponymy:

Stigler's law of eponymy is a process proposed by University of Chicago statistics professor Stephen Stigler in his 1980 publication "Stigler’s law of eponymy". In its simplest and strongest form it says: "No scientific discovery is named after its original discoverer." Stigler named the sociologist Robert K. Merton as the discoverer of "Stigler's law", so as to avoid this law about laws disobeying its very own decree.

Friday, July 25, 2014

Veusz: Awesome Plotting Software

As a working scientist, I do a lot of data plotting. Most of these plots are for internal consumption, as I try to tease meaning out of data.

I tend to use gnuplot a lot, because I've gotten extremely used to it.

However, every once in a while I have to make a plot for external consumption.

For the longest time, I've relied on Grace for my journal quality plots.

Last week, I discovered Veusz (pronounced "views"). It is a python based program for 2D plots, which feels truly modern.

Grace hasn't been updated in a while, and while it works fine for the most part, from an aesthetic standpoint, it feels like your friend from the eighties, who did not realize that bell-bottoms went out of fashion.

It is multiplatform (runs on Linux, MacOS and Windows), exports to a wide variety of useful formats (EPS, PDF, SVG, TIFF), and is unfettered by some of the legacy issues surrounding Grace (multiple plots) such as:

multiple plots/insets are a cinch
subscripts/superscripts use latex notation
presence of an undo button
more concise and readable scripts
import from a wide variety of formats
ability to link to data files instead of loading them in

It is also possible to write "script" files, and use the program from the command line. All in all, I think this is a program that I will use a lot more in the future. I will post about my experiences after I use it for a while.

Thursday, July 24, 2014

Links:

1. Math Porn: Good for a few laughs.

2. US National Archives to upload all content on Wikimedia. This is bigger than it sounds.

3. Steve Novella and Michael Fullerton argue about the 9/11 conspiracy in a four part series.

4. Unintended consequences of journal rank

Friday, July 18, 2014

The Top 10 Algorithms

I am teaching a senior undergrad seminar (for Scientific Computing majors) in the Fall semester, and thought it would be a good idea to pick some kind of a theme.

After some thought, I figured that "famous algorithms" may be a good idea. I tried to google "top algorithms" and came up with many lists. Let me begin for bad lists (in my opinion) to good ones.

A. George Dvorsky has a list of "the 10 algorithms that dominate the world"

1. Google Search
2. Facebook's News Feed
3. OKCupid Date Matching
4. NSA Data Collection, Interpretation, and Encryption
5. "You May Also Enjoy..."
6. Google AdWords
7. High Frequency Stock Trading
8. MP3 Compression
9. IBM's CRUSH
10. Auto-Tune

While the list is interesting, it is somewhat disappointing. It conflates software products with actual algorithms.

HFT is not an algorithm; although the words algorithmic trading and HFT are often used synonymously. Sure, there are important algorithms lurking under many of these "products"

B. Marcos Otero has a better list of "the real 10 algorithms that dominate the world"

This list is a reaction to the one above. The author prefaces his comments with:

Now if you have studied algorithms the first thing that could come to your mind while reading the article is “Does the author know what an algorithm is?” or maybe “Facebook news feed is an algorithm?” because if Facebook news feed is an algorithm then you could eventually classify almost everything as an algorithm.

1. Merge Sort, Quick Sort and Heap Sort
2. Fourier Transform and Fast Fourier Transform
3. Dijkstra’s algorithm
4. RSA algorithm
5. Secure Hash Algorithm
6. Integer factorization
7. Link Analysis
8. Proportional Integral Derivative Algorithm
9. Data compression algorithms
10. Random Number Generation

While this is a better list, in the sense the the items listed are usually "real" algorithms, or something close, it has a strong computer science bias. For example, #4, #5, and #6 are all algorithms for encryption. While encryption is clearly important, it is probably not 30% by weight of the most important algorithms.

C. SIAM has its own list (pdf) of the "top 10 algorithms of the 20th century"

I like this more comprehensive list the best (although I still have some reservations), because the forest in which they hunt is the biggest. Also, the list is a collaboration of two people, which provides some balance on the topics that are touched.

1. Monte Carlo Method
2. Simplex Method for Linear Programming
3. Krylov Subspace Iteration Methods
4. Decompositional Approach to Matrix Computations
5. Fortran Optimizing Compiler
6. QR Algorithm
7. QuickSort
8. FFT
9. Integer Relation Detection Algorithm
10. Fast Multipole Method

Monday, July 7, 2014

Net Neutrality Explained

Here is a nice illustrated introduction to net-neutrality, why it matters, and what one can do about it (until mid-July)!

Along the same lines, and by the same folks, "What's going on with Social Security"

Wednesday, July 2, 2014

On Student Debt

The NYT has this by-now popular article asking people to take a chill-pill. The Reality of Student Debt Is Different From the Clichés.

It is based largely based on a Brookings Institution study which essentially claims that the sky is not falling. The 3 main takeaways from that study (emphasis mine):

1. Roughly one-quarter of the increase in student debt since 1989 can be directly attributed to Americans obtaining more education, especially graduate degrees. The average debt levels of borrowers with a graduate degree more than quadrupled, from just under $10,000 to more than $40,000. By comparison, the debt loads of those with only a bachelor’s degree increased by a smaller margin, from $6,000 to $16,000.

2. Increases in the average lifetime incomes of college-educated Americans have more than kept pace with increases in debt loads. Between 1992 and 2010, the average household with student debt saw an increase of about $7,400 in annual income and $18,000 in total debt. In other words, the increase in earnings received over the course of 2.4 years would pay for the increase in debt incurred.

3. The monthly payment burden faced by student loan borrowers has stayed about the same or even lessened over the past two decades. The median borrower has consistently spent three to four percent of their monthly income on student loan payments since 1992, and the mean payment-to-income ratio has fallen significantly, from 15 to 7 percent. The average repayment term for student loans increased over this period, allowing borrowers to shoulder increased debt loads without larger monthly payments.

The NYT tries to shine a light on the real problem:

The vastly bigger problem is the hundreds of thousands of people who emerge from college with a modest amount of debt yet no degree. For them, college is akin to a house that they had to make the down payment on but can’t live in. In a cost-benefit calculation, they get only the cost. And they are far, far more numerous than bachelor’s degree holders with huge debt burdens.

Here is an attempted "takedown" of the report and the NYT article.

And here is a well-reasoned takedown of the takedown.

Tuesday, July 1, 2014

Passing arguments to sed

Suppose you have a variable var=25, and a file test.dat which contains the word foo.

$ var=25
$ cat test.dat
dim
sum
foo
ping
pong

You want to replace all instances of foo in the file to foo$var (=foo25) using sed.

You might think of trying the following:

$ sed 's/foo/foo$var/g' test.dat
dim
sum
foo$var
ping
pong

Clearly not what you expected. The fix is easy: use double quotes instead of single quotes.

$ sed "s/foo/foo$var/g" test.dat
dim
sum
foo25
ping
pong