Tuesday, December 25, 2018

Science Links

1. Donald Knuth: Yoda of Silicon Valley (NYT)

2. Schrodinger Equation or Lunn Equation? (wikipedia, quora, Physics Today)
The Physical Review referee, G. S. Fulcher, found Lunn's paper to be unphysical and impossibly abstract, and he rejected it. Fulcher replaced Lunn as a member of Physical Review's editorial board early in 1922, and Lunn went on to withdraw bitterly from contact with most physicists...
3. CRISPR and human babies (The Atlantic)

QOTD: "Education is learning what man has discovered over the past 5000 years, in 20 years or less." (via @TheWeirdWorld)

Wednesday, December 19, 2018

QuickTip: Default Color Cycle in Matplotlib

For reasonably recent versions of matplotlib [v > 1.5], you can extract the default color scheme into a string array by:

clr = [p['color'] for p in plt.rcParams['axes.prop_cycle']]

Thursday, December 13, 2018

Simplex and George Dantzig

We talked about the Simplex algorithm in class last week. It was invented by George Dantzig as a tool to solve large linear programming problems.

The story of how Dantzig, as a graduate student, mistook two unsolved problems in statistical theory for homework assignments is truly inspiring. It sounds like something you would see in a movie (like Goodwill Hunting), except it is true (Snopes).

QuickTip: Extract Bibliography File from Cited References

Suppose you have a couple of big BiBTeX databases that you import into a document myDoc.tex, either as,

\addbibresource{database1.bib}
\addbibresource{database2.bib}


or,

\bibliography{database1,database2}.

Suppose the databases contain 1000s of records, while your document contains only a few tens. If you want to extract a new ".bib" file only from the references cited in the paper, then you can use the bibexport tool that comes with the TeXLive distribution.

bibexport -o extracted.bib myDoc.aux

This produces a new bibliography file extracted.bib, which contains only those records used in myDoc.tex.

Sunday, December 9, 2018

Pair-Programming and Google

The New Yorker has a riveting story about the unlikely friendship of Jeff Dean and Sanjay Ghemawat, and its role in shaping Google.
Sanjay looked at Jeff. For months, Google had been experiencing an increasing number of hardware failures. The problem was that, as Google grew, its computing infrastructure also expanded. Computer hardware rarely failed, until you had enough of it—then it failed all the time. Wires wore down, hard drives fell apart, motherboards overheated. Many machines never worked in the first place; some would unaccountably grow slower. Strange environmental factors came into play. When a supernova explodes, the blast wave creates high-energy particles that scatter in every direction; scientists believe there is a minute chance that one of the errant particles, known as a cosmic ray, can hit a computer chip on Earth, flipping a 0 to a 1.

Friday, November 30, 2018

Machine Learning and Behavioral Biases

Machine learning, using neural nets for example, helps us tease out hidden nonlinear correlations or patterns. A standard application is digitizing hand written numerals.

You train the model on a particular dataset, and test it on data that is previously unseen. If the training and test datasets are "similar", then the predictions of the learnt model will be good.

If the test data look nothing like the training data then the ML model will fail (e.g. train on Arabic numerals and test on Roman numerals).

Our brains were trained for thousands of years when we lived in the wild. Survival was key. A significant fraction of the model has been seared into our hardware.

Modern times look nothing like the training data for which our brains were optimized (saber-tooth tigers, food scarcity).

Most cognitive or behavioral biases originate in this mismatch between our training dataset and test dataset.

Tuesday, November 27, 2018

Links: Math Edition

1. How paradoxes shape Mathematics (article links to presentation)

2. Paul Romer on "Jupyter, Mathematica, and the Future of the Research Paper"

3. Wilson's Matrix (Cleve Moler)

4. Is a kB 1000 bytes or 1024 bytes? (John D. Cook)

Wednesday, November 21, 2018

Interesting Quotes

"Luck is probability taken personally.” - Chip Denman (via Penn Jillette and Annie Duke)

"Commenting your code is like cleaning your bathroom - you never want to do it, but it really does create a more pleasant experience for you and your guests." - Ryan Campbell

"Almost everything that you succeed at looks easy in retrospect." - Luis Sordo Vieira (via @strogatz)

"When one teaches, two learn." - Robert Heinlein

"Nice library. Is one of these a trick book?"
"How so?"
"Like you pull it off the shelf and a hidden door opens."
"Oh. Yeah, all of them." - @ASmallFiction

"Read what you love until you love to read." - @naval

Friday, October 26, 2018

Matplotlib Tight Layouts and Plots with Insets

For normal plots or subplots, the tight_layout command does a pretty good job of keeping things from overlapping, and managing the bounding box of the overall figure.

However, if you have plot with insets etc. then tight_layout can throw a tantrum. Something like: "ValueError: max() arg is an empty sequence".

For such plots, if your axis label gets cut, and you don't want to push the label too close to the axis with something like:

ax1.xaxis.labelpad = -10

then, you can use the bbox_inches = "tight" flag when saving your figure to ensure your axis labels are not clipped.

plt.savefig('xyz.pdf',  bbox_inches = "tight")

Wednesday, October 24, 2018

Image Processing with Python

Basic
  • matplotlib can read png and jpg files as numpy objects.
  • imageio is a newer library that can read and write to a variety of image formats.
  • scipy.ndimage provides some additional functionality for manipulating the image arrays.
Intermediate
  • scikit-image is a library that offers a toolbox comparable to Matlab’s image processing toolbox.
Advanced

The following libraries provide more advanced functions for image manipulation.
Some resources on using Python’s basic image processing capabilities.

Sunday, October 21, 2018

The Secretary Problem

I've been obsessed with the Secretary Problem since I first heard about it. I even wrote about it over 7 years ago, when I used it to fashion a lab exercise in one of my classes.

Here is a nice old article (1989) "Who Solved the Secretary Problem" by Thomas Ferguson. If that link doesn't work, here is the stable url from JSTOR. It allows you to read up to six articles per month without any subscription.

Thursday, October 18, 2018

Jupyter Notebooks: Interacting with Python Files

You can make functions defined inside a python file [myFunctions.py] visible in a Jupyter Notebook by simply importing it as a module.

For example suppose myFunctions.py contains:

$ cat myFunctions.py

def func1():

def func2():

etc.

You can use the functions by importing the python file as a module.

from myFunctions import *

This makes func1() and func2() visible inside the Jupyter notebook.

This feature can be helpful in reducing clutter by moving large walls of code out of the notebook, which can then retain a simpler look and feel.

There are two potential issues:

  • when you make changes to python file, they are not immediately reflected in the notebook
  • if your python file has any "script"-like commands outside the function definitions, they are executed when the file is imported as a module

The first issue can be taken care of by the magic command %autoreload.

In [1]: %load_ext autoreload

In [2]: %autoreload 2

In [3]: from myFunctions import func1

In [4]: func1()
Out[4]: 42

In [5]: # open myFunctions.py in an editor and change func1() to return 43

In [6]: func1()
Out[6]: 43

The second issue can be resolved by decorating the "script" commands with an appropriate if statement, which ensures that those commands are not executed unless the file is executed directly.

$ cat myFunctions.py

def func1():

def func(2):

#  
# Main Driver
# This part is not run when imported as a module
#
if __name__ == '__main__':
    some script commands
    print('something')

Monday, October 8, 2018

The Nature of Code

I accidentally stumbled on Daniel Shiffman's "The Nature of Code" online book. It is the ideal programming 102 book - something interesting to fool around with once you've learned the first few essentials of coding.
The goal of this book is simple. We want to take a look at something that naturally occurs in our physical world, then determine how we can write code to simulate that occurrence.
I love this real-world slant. The applications cover physics, biology, agent-based modeling, automata, neural networks etc.

Give it a look.

Saturday, September 22, 2018

Don't Judge a Book by its Cover

Situations or observations often have an "obvious" first-order explanation. These explanations are attractive and complete.

Sometimes, however, a deeper and far more interesting second order effect lurks under the surface.

Consider this military example of survivorship bias:
During World War II, the statistician Abraham Wald took survivorship bias into his calculations when considering how to minimize bomber losses to enemy fire. Researchers from the Center for Naval Analyses had conducted a study of the damage done to aircraft that had returned from missions, and had recommended that armor be added to the areas that showed the most damage. Wald noted that the study only considered the aircraft that had survived their missions—the bombers that had been shot down were not present for the damage assessment. The holes in the returning aircraft, then, represented areas where a bomber could take damage and still return home safely. Wald proposed that the Navy instead reinforce the areas where the returning aircraft were unscathed, since those were the areas that, if hit, would cause the plane to be lost. His work is considered seminal in the then-fledgling discipline of operational research.
Or perhaps this example of the law of large (or small) numbers from Statistics Done Wrong, that I brought up previously on the blog. The mean is not a reliable metric, when the variance is large.
Suppose you’re in charge of public school reform. As part of your research into the best teaching methods, you look at the effect of school size on standardized test scores. Do smaller schools perform better than larger schools? Should you try to build many small schools or a few large schools?

To answer this question, you compile a list of the highest-performing schools you have. The average school has about 1,000 students, but the top-scoring five or ten schools are almost all smaller than that. It seems that small schools do the best, perhaps because of their personal atmosphere where teachers can get to know students and help them individually.

Then you take a look at the worst-performing schools, expecting them to be large urban schools with thousands of students and overworked teachers. Surprise! They’re all small schools too. 
Smaller schools have more widely varying average test scores, entirely because they have fewer students. With fewer students, there are fewer data points to establish the “true” performance of the teachers, and so the average scores vary widely. As schools get larger, test scores vary less, and in fact increase on average.
Or this example from Kahneman on whether praise or criticism improves outcomes, when an Israeli air force instructor claimed:
“On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.”
The underlying second-order principle that was operative was regression to the mean. Praise or criticism had nothing to do with the observations, and were merely nuisance variables.

Or perhaps the grisly observation that sales of ice-cream and the number of homicides in cities are strongly correlated. Here, the first order explanation might be to dismiss the correlation as spurious.

However, a more careful look might point out to an important hidden variable, warm weather, which helps us come up with a causal explanation. Warm weather makes people buy more ice-cream. Warm weather also brings people outdoors, which increases the odds of murders.

Friday, September 7, 2018

Pascal and Fermat

Fermat and Pascal exchanged correspondence discussing the problem of points.

Here is a loose sketch of the problem:

Two players toss a fair coin. Player A gets a point, if it comes up heads, while player B gets a point, if it comes up tails.

They repeat this, until one of the player gets to 10 points.

At the start of the game, each player wagers $50 for a total pot of $100.

Suppose the game is interrupted at a certain point due to unavoidable reasons (say player A has 8 points and player B has 7 points).

How should the pot be divided?

At the start of the game, the odds are even, since the coin is fair. However, when the game is interrupted, player A has a higher chance of winning. How can we systematically take this condition into account?

A wonderful exploration of this problem written using "modern" terminology is available here. It outlines the problem, sketches Fermat's and Pascal's approaches, and generalizes the problem and solution.

The original correspondence (translated) is available here (pdf).

Friday, August 24, 2018

Data Visualization Resources

Here are some resources that one of my students (Eitan Lees) shared during a graduate seminar on visualization.
  • The Data Visualisation Catalogue has an exhaustive catalog of different charts and graphical representations. It is a useful first place to browse when figuring out what kind of plot to use.
  • Lot of common "errors", and practical pointers on improving graphical presentation is available at this wonderful site.
  • A generally good idea is to remove to improve
  • Paletton is a site where you can play with colors and color schemes.

Saturday, August 18, 2018

Mean Squared Displacement

The mean-squared dislacement of a particle is defined simply as, \[\rho(t) = \langle r^2(t) \rangle = \int c(r) p(r,t) dr,\] where \(c(r) = 1\), \(2 \pi r\), and \(4 \pi r^2\) in 1, 2, and 3 dimensions, respectively. This evaluates to \(\langle r^2(t) \rangle = 2dDt\), where \(d\) is the dimension (1, 2, or 3).

One can also compute the variance of the MSD, as \[\text{var}(\rho) = \langle \rho^2(t) \rangle - \left(\langle \rho(t) \rangle\right)^2.\] This can be evaluated as,
\begin{align}
1D:& 2\,(2Dt)^2 = 8D^2t^2\\
2D:& \dfrac{2}{2} (4Dt)^2 = 16 D^2 t^2\\
3D:& \dfrac{2}{3} (6Dt)^2 = 24 D^2 t^2
\end{align}
This can be simplified into a common expression as, \[\text{var}(\rho) = \dfrac{2}{d} \rho^2\]

Wednesday, August 15, 2018

Plotting CDF: Note to Self

Consider the histogram of samples from a normal distribution:

x = np.random.normal(0., 1., size=10000)
pdf, bins = np.histogram(x, normed=True)

The size of the array "bins" is not equal to the size of "pdf". Consecutive elements of "bins" specify the left and right edges of a particular bin. Thus, by default in python, "bins" array has 11 elements, while "pdf" has 10 elements.

Note that the matplotlib command "hist" is identical in this regard.

Now suppose you want to compare the histogram with the theoretical PDF (Gaussian). Using the histogram, one could construct an equivalent line chart by taking the mid point of each bin.

# the histogram of the data
pdf, bins, patches = plt.hist(x, 30, normed=1, facecolor='green', alpha=0.4)
xpdf = (bins[1:]+bins[:-1])/2 # midpoints
plt.plot(xpdf, pdf, 'o-')

# theoretical curve
xi = np.linspace(-4, 4)
gx = 1/np.sqrt(2.*np.pi)*np.exp(-xi**2/2)
plt.plot(xi, gx, 'k--')

Everything looks fine.

Now let's consider the CDF, and plot it against the theoretical CDF. If I use bin midpoints to plot the empirical CDF I get something funky.

from scipy.special import erf
cdf  = np.cumsum(pdf)
cdf  = cdf/cdf[-1]
plt.plot(xpdf, cdf, 'o')

gcdf = 0.5*(1 + erf(xi/np.sqrt(2.)))
plt.plot(xi, gcdf)

There is a visible offset.

Instead of using bin midpoints, I should use the right limits when plotting the CDF (this makes sense upon a moments reflection!).

xcdf = bins[1:]
plt.plot(xcdf, cdf, 'o')

gcdf = 0.5*(1 + erf(xi/np.sqrt(2.)))
plt.plot(xi, gcdf)


Wednesday, August 8, 2018

Links

1. Yogic Capitalism (Bloomberg on Baba Ramdev)

2. Netflix has a decent documentary on him

3. A history of punctuation in English (Ashley Timms)

4. Tim Urban has some career advice

Wednesday, July 25, 2018

Python Default Search Path

Suppose you have a bunch of handy python utility functions. As as example consider the matrixTeX function, to write numpy matrices in LaTeX format.

I use this function all the time, would like it to be visible, no matter where I fire my python session from. To do this we need to modify python's default search path.

Here is a three-step setup process:

1. Store the Utility Functions in a Subdirectory

There are several ways to do this.

  • create a sub-directory called 'myPython' in my home directory ('/home/sachins/')
  • in '/home/sachins/myPython/' create a file myPyUtils.py
  • put all your utility functions in this file

> cat /home/sachins/myPython/myPyUtils.py

def matrixTeX():
    function definition

def util2():
    function definition

def util3():
    function definition

2. Point Python to the Subdirectory

You can do this temporarily or permanently.

To set it up temporarily, fire up a python session, and append the sys.path variable to expand the default search path. In our example, I would do the following:

> python3
>>> import sys
>>> sys.path.append('/home/sachins/myPython')

This setting is forgotten when you terminate your python session. Thus, you have to invoke this pair of commands, every time you start a session.

To avoid this, you can set things up so that they are more permanent.

Open your .bashrc file (if you are using bash) and modify the PYTHONPATH variable.

In this case, I  add the following lines:

# set python path to look for scripts
export PYTHONPATH=$PYTHONPATH:/home/sachins/myPython

Now you are all set. We can import the utility functions and use them from anywhere.

3. Import the Utilities Functions

Fire up a python session. Once you have expanded the search path (temporarily or permanently), you can import the utility file (myPyUtils.py) as a module, and access its individual functions. As an example,

>>> import numpy as np
>>> from myPyUtils import *
>>> print(matrixTeX(np.array([1,2])))
\begin{bmatrix}
  1 & 2\\
\end{bmatrix}

Notes:
  • the first time you import the file, it will create a ".pyc" file in the subdirectory
  • you can be more careful with the namespace during importing

Tuesday, June 26, 2018

QuickTip: Split Single Column into Multiple Columns

Consider a single column file which begins with

$ cat file.txt
1
yes
single
125K
no
2
no
married
100K
no
...

Suppose you want to split it into 5 columns so that it looks like

1 yes single 125K no
2 no married 100K no
...

You can use either,

$ xargs -n5 < file.txt

or if you want some control over the delimiter

$ paste - - - - - -d, < file.txt

1,yes,single,125K,no
2,no,married,100K,no


Wednesday, June 20, 2018

QuickTip: Adding Search Paths in Octave

Suppose you want a utility (such as matrixTeX.m) to be visible to every octave session.

1. Select a suitable folder to keep your local octave files in (e.g. /home/sachin/octave/ on my Linux box). If such a folder doesn't exist, make one.

2.  Fire up an octave session

3. In Octave, add the path to the path variable:

octave:> addpath(genpath("/home/sachin/octave")) 

4. List directories in default path (your output will likely look different)

octave:> path 
/home/sachin/octave
/home/sachin/octave/io-2.4.11
/home/sachin/octave/io-2.4.11/doc
/home/sachin/octave/io-2.4.11/packinfo
/home/sachin/octave/io-2.4.11/templates

etc.

4. So far, we haven't permanently written the modified path to disk. To do this:

octave:> savepath

Tuesday, June 12, 2018

My LaTeX Cheat Sheet

Here is a link to my LaTeX cheatsheet.

Here is a list of some useful LaTeX related links from the past:

Saturday, June 9, 2018

Links

1. Anti-GMO study fails to replicate (neurologica)

2. When a six-sigma event occurs (John D Cook)

3. Jordan Canonical Form doesn't compute (Cleve Moler)

4. I have no peers! (wondermark)

Wednesday, May 16, 2018

Quotes: Shower Thoughts

I really enjoy some of the observational/insight tweets that "ShowerThoughts" echoes from /r/showerthoughts. Here are some relatively recent ones:
Your DNA contains millions of years worth of software updates. 
At home it’s weird for two people to eat two different things for dinner but at a restaurant it’s weird to order the same thing. 
When people talk about traveling to the past, they worry about radically changing the present by doing something small, but barely anyone in the present really thinks that they can radically change the future by doing something small. 
Shrek is a movie about loving yourself despite your physical appearance, all the while making fun of Lord Farquaad for being short. 
It’s weird that “you’re shit” and “you ain’t shit” are both insults, but “you’re the shit” is a complement. 
Cassettes had side A and side B, therefore it was kind of logical its succesor would be the CD. 
The stock market basically rewards people for liking something before it was cool. 
Our urge to sing along to songs we hear is basically the same thing as wolves howling when they hear a howl. 
I watched my dog chase his tail for 10 minutes and thought “Wow, dogs are so easily entertained”. Then I realized I just watched my dog chase his tail for 10 minutes.

Effective Oral Presentations

Recently, we talked about how to make effective oral presentations in a first year seminar class.

The focus was on technical talks - the ones that are typically accompanied by a slide deck.

I found the following resources quite useful in structuring a discussion:
Check them out.

Thursday, May 3, 2018

Where are you Dear Alphonso?

It is mango season!

I learnt several new things from an excellent piece "We Were Promised the World's Most Delicious Mangoes. They Never Came." at Vice.com.

The story resonated with me, because it connects two geographies (western Maharashtra and south Florida) which I have called home for extended periods, and a fruit I love.

Here are some of the highlights:
  • India grows over 40% of the worlds mangoes (over 1000 varieties). The scientific name of a mango is mangifera indica.
  • 90% of US's mangoes come from Mexico, Peru and Brazil. The 2-3 week ship voyage is the primary barrier for Indian mangoes. Air transport is too expensive.
  • Tommy Atkins and the Kent are the most popular mangoes in the US. These "shitty" mangoes trace their origin to India in the late 1800s, when 12 saplings of a bunch of different varieties were shipped to the USDA.
  • Only one of them the "Mulgoba" survived, and spread through south Florida. The varieties of mango popular in the US were descendants of the Mulgoba that grew on the properties of Thomas Atkins of Broward County and Leith Kent of Coconut Grove.

Monday, April 30, 2018

Matlab and Python

I use Matlab/Octave, quite a bit, for pre- and post- processing. About 3-4 years ago, I started dabbling with python, and gradually began using it to handle increasing parts of my workflow.

Last week, I chanced upon this webpage at Mathworks, which tries to argue why Matlab is superior to python. Here are the key advantages:
  • The matrix-based MATLAB language lets you express math directly: This is definitely true. Even now, when my work predominantly involves linear algebra, I sometimes use Matlab/Octave. The notation is natural and concise. But this is Matrix Laboratory after all.
  • Engineers and scientists deserve tools that fit the way they work. They shouldn’t have to adapt the way they work to fit their tools: It is true that Matlab documentation feels like it was written with engineers and scientists in mind, while python documentation has a computer-sciency feel. But, the other reasons bundled under this heading don't seem like they apply to me.
  • Proven MATLAB toolboxes provide the functions and capabilities you need. Period.: However, many toolboxes are have to be bought separately. The standard scientific python stack is fairly mature at this point. Even the library overall landscape for python is richer for python in my mind. It reminds of this xkcd cartoon.
  • MATLAB apps let you complete tasks more easily than with custom programming
  • MATLAB helps automate the entire path – from research to production
  • You can trust the results you get in MATLAB: I lumped these three together, because for some reason, they are non-issues for me. I enjoy prototyping in python. If I am hardware limited, and I have to squeeze performance, then I usually code in C++ or Fortran.

  • MATLAB runs your programs faster – meaning you can try more ideas and solve bigger problems: This is probably true, but ignores the existence of JIT compilation (Numba) or integrating with C++/Fortran (Cython) etc.
Some of the advantages of python over Matlab are listed here, here, and here. If I had to make an intermediate term prediction (10 years), I think python will become more popular than Matlab, among scientists and engineers.

Tuesday, April 24, 2018

Tyranny of Metrics

EconTalk recently had Jerry Muller on the podcast discussing his book "Tyranny of Metrics". He laments over our fixation with our obsession with measurement.

Metrics are fine as a diagnostic tool, he argues, but when they are used as surrogates for success and attached to rewards, things go haywire. There are only a handful of metrics that retain their usefulness once they are widely adopted. The power of incentives prods people to game the statistic.

Muller gives several examples from various fields:

Measuring the success rates of surgeons performing certain operations and making them available seems like a good idea. Undoubtedly, transparency would helps patients make better choices. However, after the scorecards became public, surgeons began avoiding complicated cases, which would lower their batting average.
Then there is the phenomenon of goal diversion. A great deal of K-12 education has been distorted by the emphasis that teachers are forced to place on preparing students for standardized tests of English and math, where the results of the tests influence teacher retention or school closings. Teachers are instructed to focus class time on the elements of the subject that are tested (such as reading short prose passages), while ignoring those elements that are not (such as novels). Subjects that are not tested—including civics, art, and history—receive little attention. 
Or, to take an example from the world of business. In 2011 the Wells Fargo bank set high quotas for its employees to sign up customers who were interested in one of its products (say, a deposit account) for additional services, such as overdraft coverage or credit cards. For the bank’s employees, failure to reach the quota meant working additional hours without pay and the threat of termination. The result: to reach their quotas, thousands of bankers resorted to low-level fraud, with disastrous effects for the bank. It was forced to pay a fortune in fines, and its stock price dropped.
Here is yet another book review in Science

Sunday, April 15, 2018

Diffusion in Higher Dimensions

In the previous posts (1 and 2), we wrote down the probability or concentration distribution of a bunch of Brownian diffusors initially at \(x = 0\) (delta function), \[p_{1D}(x, t) = \dfrac{1}{\sqrt{4 \pi Dt}} \exp\left(-\dfrac{x^2}{4Dt}\right)\]
The PDF is normalized on the domain \(x \in [-\infty, \infty]\) so that, \[\int_{-\infty}^{\infty} p_{1D}(x,t)\, dx = 1.\] In 2D, \(\langle r^2(t) \rangle = \langle x^2(t) \rangle + \langle y^2(t) \rangle\). If diffusion is isotropic, then \(\langle r^2(t) \rangle = 2Dt + 2Dt = 4Dt\). In this case,
\begin{align}
p_{2D}(r, t) & = p_{1D}(x, t) \, p_{1D}(y, t)\\
& = \dfrac{1}{\sqrt{4 \pi Dt}} \dfrac{1}{\sqrt{4 \pi Dt}} \exp\left(-\dfrac{1}{2} \dfrac{x^2+y^2}{2Dt}\right)\\
& =\dfrac{1}{4 \pi Dt} \exp\left(-\dfrac{r^2}{4Dt}\right)
\end{align}

The PDF is normalized such that, \[\int_{0}^{\infty} (2\pi r) \, p_{2D}(r,t)\, dr = 1.\]
Finally, for isotropic 3D diffusion, \[p_{3D}(r, t) = \left(\dfrac{1}{4 \pi Dt}\right)^{3/2} \exp\left(-\dfrac{r^2}{4Dt}\right).\] The PDF is normalized such that, \[\int_{0}^{\infty} (4\pi r^2) \, p_{3D}(r,t)\, dr = 1.\] In summary, for \(d\) = 1, 2, or 3 dimensions
\[p_{dD}(r, t) = \left(\dfrac{1}{4 \pi Dt}\right)^{d/2} \exp\left(-\dfrac{r^2}{4Dt}\right).\]

Saturday, April 7, 2018

Notebooks and Exploration

The Atlantic has a nice article on genesis and evolution of Mathematica and Jupyter notebooks, and how the latter was inspired by the former. It is provocatively (unfortunately) titled, "The Scientific Paper is Obsolete".

The article itself is more thoughtful and nuanced.

It is a reflection on the use of notebooks as exploratory vehicles, and as computational essays. This is indeed how I use Jupyter notebooks these days. I use them as a pre-processing tool (exploratory mode) when I have to design a new lecture or lab, or plan a set of new calculations. I also use them as a post-processing tool, especially in my research. Once all the raw computation is done, I can play with the results interactively, and eventually interleave a narrative and charts. This notebook often becomes the starting point of the "Results and Discussion" section of any resulting paper.

Here are some passages from the article that I found interesting or appealing:
The notebook interface was the brainchild of Theodore Gray, who was inspired while working with an old Apple code editor. Where most programming environments either had you run code one line at a time, or all at once as a big blob, the Apple editor let you highlight any part of your code and run just that part. Gray brought the same basic concept to Mathematica, with help refining the design from none other than Steve Jobs.

“I’ve noticed an interesting trend,” Wolfram wrote in a blog post. “Pick any field X, from archeology to zoology. There either is now a ‘computational X’ or there soon will be. And it’s widely viewed as the future of the field.” 
A 1997 essay by Eric S. Raymond titled “The Cathedral and the Bazaar,” in some sense the founding document of the modern open-source movement, challenged the notion that complex software had to be built like a cathedral, “carefully crafted by individual wizards or small bands of mages working in splendid isolation.” Raymond’s experience as one of the stewards of the Linux kernel (a piece of open-source software that powers all of the world’s 500 most powerful supercomputers, and the vast majority of mobile devices) taught him that the “great babbling bazaar of differing agendas and approaches” that defined open-source projects was actually a strength. “The fact that this bazaar style seemed to work, and work well, came as a distinct shock,” he wrote.

The Mathematica notebook is the more coherently designed, more polished product—in large part because every decision that went into building it emanated from the mind of a single, opinionated genius. “I see these Jupyter guys,” Wolfram said to me, “they are about on a par with what we had in the early 1990s.” They’ve taken shortcuts, he said. “We actually want to try and do it right.”

Wednesday, April 4, 2018

Writing Technical Papers

Here is some decent advise on how to improve the quality of technical writing:

Ten simple rules for structuring papers
PLOS Computational Biology, 2017.

Whitesides’ Group: Writing a Paper
Advanced Materials, 2004.

Writing a Research Paper in the Natural Sciences
Graduate Writing Lab, Yale University, 2015.

10 Tips on How to Write Less Badly
Michael Munger, CHE, 2010.

Tuesday, April 3, 2018

Diffusion and Random Walks

In the previous post, we saw how the probability distribution \(p(x,N)\) after \(N\) random steps on a unit lattice is given by, \[p(x, N) = \dfrac{1}{\sqrt{2 \pi N}} \exp\left(-\dfrac{x^2}{2N}\right)\] If the average step size is \(b\) instead of \(b=1\), then we can generalize, and write the formula as:
\[p(x, N) = \dfrac{1}{\sqrt{2 \pi Nb^2}} \exp\left(-\dfrac{x^2}{2Nb^2}\right)\] Now consider a Gaussian random walk in 1D. Suppose the stepsize at each step is drawn from a normal distribution \(\mathcal{N}(0, 1)\). While it has the same average stepsize as a walk on the lattice, an individual step may be shorter or longer than b=1.

In polymer physics, where a Gaussian coil is often used as a model for polymer conformations, \(b\) is called the Kuhn length, and \(N\) is proportional to the molecular weight.

Due to the connection between Brownian motion and random walks, the mean squared distance travelled by a particle in 1D with self-diffusivity \(D\) is \(\langle x^2(t) \rangle = 2Dt\). Similarly, the mean end-to-end squared distance of a Gaussian random walk is given by, \[\langle x^2(N) \rangle = \int_{-\infty}^{\infty} x^2 p(x, N) dx = Nb^2 \equiv 2Dt = \langle x^2(t) \rangle.\] This allows us to re-parameterize the equation for the position of a Brownian diffusors. \[p(x, t) = \dfrac{1}{\sqrt{4 \pi Dt}} \exp\left(-\dfrac{x^2}{4Dt}\right)\] Look at the correspondence between \(t\) and \(N\), and \(b\) and \(\sqrt{2D}\).

Tuesday, March 27, 2018

A Primer on Diffusion: Random Walks in 1D

Consider a particle, initially at the origin, jumping around randomly on a 1D lattice. The particle tosses a fair coin, and decides to jump left or right.

A particular trajectory of the particle may look like the following:


Suppose the particle makes \(n_{+}\) hops to the right, and \(n_{-}\) hops to the left. Then, the total number of steps \(N = n_{+} + n_{-}\), and the position at the end is \(x = n_{+} - n_{-}\).

The process is probabilistic, and the outcome of any single trajectory is impossible to predict. However, let us enumerate the number of ways in which a random walk of \(N\) steps, results in \(n_{+}\) hops to the right. This is given by, \begin{align*}
W(x, N) & = {}^N C_{n_{+}}\\
& =  \dfrac{N!}{N-n_{+}!n_{+}!}\\
& = \dfrac{N!}{n_{-}!n_{+}!}
\end{align*} The probability \(p(x, N)\) of ending up at \(x\) after \(N\) steps can be obtained by dividing \(W(x, N)\) by the total number of paths. Since we can make two potential choices at each step, the total number of paths is \(2^N\).
\[p(x, N) = \dfrac{W(x,N)}{2^N}.\]
For large \(N\), Stirling's approximation is \(N! \approx \sqrt{2 \pi N} (N/e)^N\). For \(x \ll N\), this implies, \[p(x, N) = \dfrac{1}{\sqrt{2 \pi N}} \exp\left(-\dfrac{x^2}{2N}\right)\]
Both the distributions have the same shape. However, because one is a discrete distribution, while the other is continuous, they have different normalizations, and hence different actual values of \(p(x,N)\).

Sunday, March 25, 2018

Links: Probability, Statistics, and Monte Carlo

1. A beautiful visual introduction to some concepts in probability and statistics (link)
Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).
It starts from relatively basic concepts, touches on some intermediate-level topics (Basic Probability, Compound Probability, Probability Distributions, Frequentist Inference, Bayesian Inference, Regression Analysis)

2. You are not a Monte Carlo Simulation (link)

It is now well-established that humans feel the pain of loss more strongly than the pleasure of an equivalent amount of gain. This interesting quirk may be shelved as an unfortunate cognitive bias (like confirmation bias); something for our rational mind to overcome.

However, one can the follow up question: why? A first level explanation is as follows: Suppose you invest $100. You lose 50% one day, and gain 100% the next day. You are now back to square one ($100 * 0.50 * 2.0 = $100). A gain twice the size of the loss was necessary to stay neutral.

Corey Hoffstein argues remarkably well that for individuals average outcomes are less meaningful than median outcomes. That the logarithmic scale for utility is more appropriate than a linear scale. And that loss aversion - that silly behavioral quirk - might be a powerful survival technique that helps us live to fight another day.

I loved this insightful post. If nothing else, do yourself a favor and read the summary.


Thursday, March 22, 2018

Links to matplotlib Resources

I wanted to pull together a list of matplotlib resources that I need to consult frequently.

1. SciPy Lectures: The entire series is great, including the introduction to matplotlib.

2. Tutorials from J.R. Johansson and Nicholas P. Rougier

3. A couple of my own Jupyter notebooks on customizing styles, and multiplots.

Friday, February 23, 2018

Google Colaboratory

If you need to use and interact with a jupyter notebook on a computer that does not have it installed, Google Colaboratory seems like a great in-browser solution. I learned about it from a student earlier this semester from a student.

The best part is that you don't need to install any software locally on the machine. The standard scientific/data science python stack (numpy, scipy, sympy, pandas) is available, and you can even "install" some additional on the fly using pip install.

It works more or less like Google Docs, in that you documents are saved on Google Drive, and you can collaborate with others in much the same way.

Check it out!

Tuesday, February 20, 2018

One Year Later

One year ago, I decided to get off of Facebook.

It wasn't a carefully thought out decision. I did not weigh the positives against the negatives. I just stopped.

There were some signs of this for a few years. In mid-2016, I wrote:
A few years ago, Facebook was a source of joy in my life. I was actively rediscovering friends who had slipped away over time. Reconnecting, discovering what they were up to, and filling the gap between where we had left and found each other again, ushered in a sense of everyday freshness. 
Over time, as a billion people got onboard, the rate of rediscovery diminished, and so did the excitement of eagerly checking new notifications. These days, most of my Newsfeed is cluttered with click-bait, unscientific bullshit, flashy headlines, and "Hallmark" greetings.
Here is a recent Vanity Fair article that touches on similar issues.
During the past six months alone, countless executives who once worked for the company are publicly articulating the perils of social media on both their families and democracy. Chamath Palihapitiya, an early executive, said social networks “are destroying how society works”; Sean Parker, its founding president, said “God only knows what it’s doing to our children’s brains.” (Just this weekend, Tim Cook, the C.E.O. of Apple, said he won’t let his nephew on social media.) Over the past year, people I have spoken to internally at the company have voiced concerns for what Facebook is doing (or most recently, has done) to society. Many begin the conversation by rattling off a long list of great things that Facebook inarguably does for the world—bring people and communities together, help people organize around like-minded positive events—but, as if in slow motion, those same people recount the negatives. 

Monday, February 12, 2018

Links

1. "Ten Lessons I WishI Had Learned Before I Started Teaching Differential Equations" (Gian-Carlo Rota)
What can we expect students to get out of an elementary course in differential equations? I reject the "bag of tricks" answer to this question. A course taught as a bag of tricks is devoid of educational value. One year later, the students will forget the tricks, most of which are useless anyway. The bag of tricks mentality is, in my opinion, a defeatist mentality, and the justifications I have heard of it, citing poor preparation of the students, their unwillingness to learn, and the possibility of assigning clever problem sets, are lazy ways out.
2. A web clone of MS Paint (jspaint.ml)

3. Strogatz Lectures on Nonlinear Dyanmics and Chaos (youtube)

Sunday, February 4, 2018

Frequentist verus Bayesian Statistics

Jake VanderPlas has a bunch of interesting resources on this fascinating topic.

For example, this video from SciPy 2014 and the associated conference proceeding.


He also has a nice python-based 5-part series on the same topic.

Tuesday, January 16, 2018

Unsolvability of Quintic Equations

General formulas for roots of quadratic, cubic, and quartic equations can be written in closed form using the following algebraic operations: addition, subtraction, multiplication, division, raising to an integer power, and taking an integer root.

However, roots of quintic equations, \[ax^5 + bx^4 + cx^3 + dx^2 + e x + f = 0,\] cannot be written in closed form using these operations.

My PhD advisor, Ron Larson, had told me that this was one of the questions he was asked on his oral PhD qualifying exam. I knew the fact, but never understood the proof, since it involved math that I was not familiar with.

Fred Akalin presents a nice proof using plenty of interactive demos, visualizations, and not much advanced math.

Friday, January 12, 2018

David Brooks: Resume Virtues versus Eulogy Virtues

Last week, I heard an interview with David Brooks on Intelligence Squared. Even though I was a few years late to the party (the show was from 2015), I found the content riveting.

Here is a video of that interview:


I found his distinction of "resume virtues" and "eulogy virtues" helpful as a compass on how to lead the good life. Here is a relevant excerpt from an NYT article:

The résumé virtues are the skills you bring to the marketplace. The eulogy virtues are the ones that are talked about at your funeral — whether you were kind, brave, honest or faithful. Were you capable of deep love? 
We all know that the eulogy virtues are more important than the résumé ones. But our culture and our educational systems spend more time teaching the skills and strategies you need for career success than the qualities you need to radiate that sort of inner light. Many of us are clearer on how to build an external career than on how to build inner character. 
But if you live for external achievement, years pass and the deepest parts of you go unexplored and unstructured. You lack a moral vocabulary. It is easy to slip into a self-satisfied moral mediocrity. You grade yourself on a forgiving curve. You figure as long as you are not obviously hurting anybody and people seem to like you, you must be O.K. But you live with an unconscious boredom, separated from the deepest meaning of life and the highest moral joys. Gradually, a humiliating gap opens between your actual self and your desired self, between you and those incandescent souls you sometimes meet.

Monday, January 8, 2018

Multitasking Doesn't Work

Yesterday, I saw a YouTube video in which we are asked to complete two tasks in serial, and in parallel (multitasking). While I am not sure if the test is representative of multitasking in everyday life, it is obvious even from this simple exercise that multitasking is counterproductive.

Switching costs decrease efficiency, quality of experience, and accuracy, while raising stress levels. Multitasking on people degrades relationships.
[...] evidence suggests that the human "executive control" processes have two distinct, complementary stages. They call one stage "goal shifting" ("I want to do this now instead of that") and the other stage "rule activation" ("I'm turning off the rules for that and turning on the rules for this"). Both of these stages help people to, without awareness, switch between tasks.  
Although switch costs may be relatively small, sometimes just a few tenths of a second per switch, they can add up to large amounts when people switch repeatedly back and forth between tasks. Thus, multitasking may seem efficient on the surface but may actually take more time in the end and involve more error. Meyer has said that even brief mental blocks created by shifting between tasks can cost as much as 40 percent of someone's productive time.
It causes collateral damage beyond that inflicted on the multitasker. Maria Konnikova writes in the New Yorker,
When Strayer and his colleagues observed fifty-six thousand drivers approaching an intersection, they found that those on their cell phones were more than twice as likely to fail to heed the stop signs. In 2010, the National Safety Council estimated that twenty-eight per cent of all deaths and accidents on highways were the result of drivers on their phones.
The vast majority (~98%) of us cannot multitask well, and shouldn't delude ourselves.

I like the quote at the opening of Christine Rosen's essay,
In one of the many letters he wrote to his son in the 1740s, Lord Chesterfield offered the following advice: “There is time enough for everything in the course of the day, if you do but one thing at once, but there is not time enough in the year, if you will do two things at a time.” To Chesterfield, singular focus was not merely a practical way to structure one’s time; it was a mark of intelligence. “This steady and undissipated attention to one object, is a sure mark of a superior genius; as hurry, bustle, and agitation, are the never-failing symptoms of a weak and frivolous mind.

Thursday, January 4, 2018

The Ritual

The place was bustling with activity. The annual ritual had begun.

Every January, lots of people don their workout gear, and hit the gym. If the past is anything to go by, the crowds will thin out in a month or so. A handful of regulars will persist.

All points on a circle are equally important. Yet, some points like January 1st are more important than others!

I observe all this, not with judgment or condescension. The optimism of a new year is infectious. 

People around, through their actions, seem to say, "Forget and forgive the past, for this year, I resolve to get into shape." It is hard not to be inspired by that.

Even if most of them will fail.

It is a recognition that though we are flawed, we will strive!

Happy New Year.

Wednesday, January 3, 2018

Some Links

1. A nice portrait of Maryam Mirzakhani (NYT)
Three years ago, Mirzakhani, 37, became the first woman to win the Fields Medal, the Nobel Prize of mathematics. News of the award, and the obvious symbolism (first woman, first Iranian, an immigrant from a Muslim country) sat uneasily with her. She was puzzled when she discovered that some people thought mathematics was not for women — it was not an idea that she or her friends encountered growing up in Iran — but she was not inclined, by personality, to tell others what to think.
2. Similar operations using sed and awk

3. Wikipedia and Fake Claims (neurologica)