Wednesday, June 28, 2017

Printing webpages as PDFs

PrintFriendly and PDF has a useful browser extension (tested on Chrome) that creates more readable PDFs from web content.

Here is a screenshot (click to enlarge) from a Matlab blog that I follow:

Notice that the webpage has lots of links, and a frame on the left.

When I use the "Print to File" feature directly from my Chrome browser, I get a PDF which looks like this:

It does the job, but it looks very amateurish. On more complicated websites, results can be horrendous.

Here is the same webpage, now using PrintFriendly.

Notice that the PDF is much cleaner, is well formatted, and contains all the relevant information.

Thursday, June 22, 2017

Joint from Marginals: Why?

In the previous blog post, we saw a special example in which we were able to sample random variables from a joint 2D-Gaussian distribution from the marginals and the correlation coefficient.

I listed a simple method, which seemed to work like magic. It had two simple steps:

• Cholesky decomposition of the covariance matrix, C(Y)
• Y = LX, where X are independent random variables

The question is, why did the method work?

Note that the covariance matrix of random variables with zero mean and unit standard deviation can be written as, $C(Y) = E(Y Y')$, where $E()$ denotes the expected value of a random variable. Thus, we can write the expected value of the Y generated by the method as, \begin{align*} E(Y Y') & = E\left(LX (LX)'\right)\\ & = L E(XX') L' \\ & = L I L'\\ & = LL' = C.\end{align*}. Here we used the fact that the covariance of X is an identity matrix by design.

Note that this method preserves the covariance matrix (and hence the standard deviation of the marginals).

Does it preserve the mean?

Yes. $E(Y) = E(LX) = L E(X) = 0.$

Do the marginals have to be normal for this method to work? Would this work for any distribution (with zero mean, and unit standard deviation)?

We will explore this in a subsequent blog.

Thursday, June 15, 2017

Joint Distribution From Marginals

Consider two dependent random variables, $y_1$ and $y_2$, with a correlation coefficient $\rho$.

Suppose you are given the marginal distributions $\pi(y_1)$ and $\pi(y_2)$ of the two random variables. Is it possible to construct the joint probability distribution $\pi(y_1, y_2)$ from the marginals?

In general, the answer is no. There is no unique answer. The marginals are like shadows of a hill from two orthogonal angles. The shadows are not sufficient to specify the full 3D shape (joint distribution) of the hill.

Let us simplify the problem a little, so that we can seek a solution.

Let us assume $y_1$ and $y_2$ have zero mean and unit standard deviation. We can always generalize later by shifting (different mean) and scaling (different standard distribution). Let us also stack them into a single random vector $Y = [y_1, y_2]$.

The covariance matrix of two such random variables is given by, $C(Y) = \begin{bmatrix} E(y_1 y_1) - \mu_1 \mu_1 & E(y_1 y_2) - \mu_1 \mu_2 \\ E(y_2 y_1) - \mu_2 \mu_1 & E(y_2 y_2) - \mu_2 \mu_2 \end{bmatrix} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix},$ where $\mu$ and $\sigma$ refer to the mean and standard deviation.

Method

A particular method for sampling from the joint distribution of correlated random variables $Y$ begins by drawing samples of independent random variables $X = [x_1, x_2]$ which have the same distribution as the desired marginal distributions.

Note that the covariance matrix in this case is an identity matrix, because the correlation between independent variables is zero  $C(X) = I$.

Now we recognize that the covariance matrix $C(Y)$ is symmetric and positive definite. We can use Cholesky decomposition $C(Y) = LL^T$ to find the lower triangular matrix $L$.

The recipe then says that we can draw the correlated random variables with the desired marginal distribution by simply setting $Y = L X$.

Example

Suppose we seek two random variables whose marginals are normal distributions (zero mean, unit standard deviation) with a correlation coefficient 0.2.

The method above asks us to start with independent random variables $X$ such as those below.

Cholesky decomposition with $\rho$ = 0.2, gives us,  $L = \begin{bmatrix} 1 & 0 \\ 0.1 & 0.9797 \end{bmatrix}.$ If we generate $Y = LX$ using the same data-points used to create the scatterplot above, we get,

It has the same marginal distribution, and a non-zero correlation coefficient as is visible from the figure above.

Saturday, June 10, 2017

1. "The seven deadly sins of statistical misinterpretation, and how to avoid them" (H/T FlowingData)

2. Desirability Bias (Neurologica)
[...] defined confirmation bias as a bias toward a belief we already hold, while desirability bias is a bias toward a belief we want to be true.
3. H/T John D. Cook
“Teachers should prepare the student for the student’s future, not for the teacher’s past.” — Richard Hamming
4. This xkcd cartoon on survivorship bias

Thursday, June 8, 2017

Matplotlib Styles

I created a jupyter notebook demonstrating the use of built-in or customized styles in matplotlib, mostly as a bookmark for myself.

Monday, June 5, 2017

Jupyter Notebook Tricks

Some cool Jupyter notebook tricks from Alex Rogozhnikov. Here are some that I did not know:
• %run can execute python code from .py files and also execute other jupyter notebooks, which can quite useful. (this is different from %load which imports external python code
• The %store command lets you pass variables between two different notebooks.
• %%writefile magic saves the contents of that cell to an external file.
• %pycat does the opposite, and shows you (in a popup) the syntax highlighted contents of an external file.
• #19  on using different kernels in the same notebook, and #22 on writing fortran code inside the notebook

Thursday, June 1, 2017

Annotating PDFs on Linux

Most of my working day is spent reading.

Usually, this means poring over some PDF document, and scribbling my thoughts - preferably on the PDF itself. I find these markups extremely helpful, when I want to recall the gist, or when it is time to synthesize "knowledge" from multiple sources.

I use Linux on both my desktops (home and work), and the usual applications (Evince, Okular, etc.) for marking up PDFs are inadequate in one form or another. Adobe Reader, while bloated, used to do the job. But they don't release a Linux version anymore.

The solution that best fits my needs currently is Foxit Reader. Although you can't use the standard software manager (ex. apt-get on Ubuntu) to get it, you can easily download a 32- or 64-bit version from their website.

The "installation guide" tells you how to do the rest [unzip, cd, and run the executable installer].

On my Linux Mint systems it was easy, peasy!

The software itself is intuitive. You can highlight, add text, stick in comments, and draw basic shapes. The changes you make are permanently saved into the PDF, so that when you use another application to reopen, the changes persist.

It is cross-platform, so you can get a version on any OS (including iOS) you want.