Saturday, December 16, 2017

Taubes, Sugar, and Fat

Last week, I listened to Shane Parrish's interview with Gary Taubes on the Knowledge Project podcast. Taubes provides an informative historical perspective on some aspects of research in nutrition science.

His view is not charitable. Perhaps, deservedly so.

I have to confess that I have't read the book "The Case Against Sugar", but I have followed Taubes' arguments for quite a while. His thesis, essentially the same as his previous two books, is that we ditch a "low-fat high-carb" diet, for a "low-carb high-fat (and protein)" diet.

The points he make are provocative, and interesting.

That said, I wished Shane would have challenged Taubes more, and held him accountable.

This counter-point by Stephan Guyenet points to numerous reasonable flaws with Taubes' thesis. It is worth reading in its entirety, if only for the balance it provides.

A couple of other rebuttals are available here and here.

Tuesday, December 12, 2017

Randomized SVD

Dimension reduction is an important problem in the era of big data. SVD is a classic method for obtaining low-rank approximations of data.

The standard algorithm (which finds all the singular values) is one of the most expensive matrix decomposition algorithms.

Companies like Facebook or Google deal with huge matrices (big data can be big). Often, they don't care about finding all the singular values - perhaps only the first 10 or 20. They may also not need exquisite precision in the singular values. Good approximations might do just fine.

Fortunately, there are randomized algorithms for finding SVDs which work on a relatively simple logic. One approximates the range of the matrix, by repeatedly multiplying it with random vectors, and works with with those.

The algorithm is fairly simple to implement:
Figure from Erichson et al
In Octave or Matlab, the code can be implemented in about 10 lines.

The resulting truncated-SVD can be a surprisingly good approximation, which can shave multiple orders of magnitude (mileage improves as matrices get bigger) from computation time.

For python, there are decent implementations of randomized SVD in the sklearn package, and the fbpca package from Facebook. This blog post shows some code to call these routines, and provides some benchmarks.

Thursday, December 7, 2017

More is Different

Last week, I read a nearly 50 year old essay by P. W. Anderson (h/t fermatslibrary) entitled "More is Different" (pdf). It is a fascinating opinion piece.
  • "Quantitative differences become qualitative ones" - Marx
  • Psychology is not applied biology, nor is biology applied chemistry.
This other essay on the "arrogance of physicists" speaks to a similar point:
But training and experience in physics gives you a very powerful toolbox of techniques, intuitions and approaches to solving problems that molds your outlook and attitude toward the rest of the world. Other fields of science or engineering are limited in their scope. Mathematics is powerful and immense in logical scope, but in the end it is all tautology, as I tease my mathematician friends, with no implied or even desired connection to the real world. Physics is the application of mathematics to reality and the 20th century proved its remarkable effectiveness in understanding that world, from the behavior of the tiniest particles to the limits of the entire cosmos. Chemistry generally confines itself to the world of atoms and molecules, biology to life, wonderful in itself, but confined so far as we know to just this planet. The social sciences limit themselves still further, mainly to the behavior of us human beings - certainly a complex and highly interesting subject, but difficult to generalize from. Engineering also has a powerful collection of intuitions and formulas to apply to the real world, but those tend to be more specific individual rules, rather than the general and universal laws that physicists have found. 
Computer scientists and their practical real-world programming cousins are perhaps closest to physicists in justified confidence in the generality of their toolbox. Everything real can be viewed as computational, and there are some very general rules about information and logic that seep into the intuition of any good programmer. As physics is the application of mathematics to the real world of physical things, so programming is the application of mathematics to the world of information about things, and sometimes those two worlds even seem to be merging.