Showing posts with label technical. Show all posts
Showing posts with label technical. Show all posts

Monday, September 30, 2019

Learning Gaussian Processes

I've been studying up Gaussian process modeling for machine learning.

For someone seeing these concepts for the first time, I would recommend the following sequence based on my experience:

1. A Visual Exploration of Gaussian Processes

It hits the key points of what makes multinormal distributions special (conditionals and marginals are normal too!), and the visuals help build intuition.

1a. Gaussian Processes for Dummies

You might not need this, but I like this essay because it is jargon-free, and focuses on how to get things going. There is python code at the end, which you can play with.

2. Chapter 2 of Gaussian Process for Machine Learning

This "bible" is astonishingly well-written. If you are familiar with linear algebra and some statistics, this is a breezy read. Plus, all the important formulae and algorithms you see in different articles, are available here in one place!

3. If you like videos, then this YouTube lecture might be worth watching!

Saturday, August 18, 2018

Mean Squared Displacement

The mean-squared dislacement of a particle is defined simply as, \[\rho(t) = \langle r^2(t) \rangle = \int c(r) p(r,t) dr,\] where \(c(r) = 1\), \(2 \pi r\), and \(4 \pi r^2\) in 1, 2, and 3 dimensions, respectively. This evaluates to \(\langle r^2(t) \rangle = 2dDt\), where \(d\) is the dimension (1, 2, or 3).

One can also compute the variance of the MSD, as \[\text{var}(\rho) = \langle \rho^2(t) \rangle - \left(\langle \rho(t) \rangle\right)^2.\] This can be evaluated as,
\begin{align}
1D:& 2\,(2Dt)^2 = 8D^2t^2\\
2D:& \dfrac{2}{2} (4Dt)^2 = 16 D^2 t^2\\
3D:& \dfrac{2}{3} (6Dt)^2 = 24 D^2 t^2
\end{align}
This can be simplified into a common expression as, \[\text{var}(\rho) = \dfrac{2}{d} \rho^2\]

Sunday, April 15, 2018

Diffusion in Higher Dimensions

In the previous posts (1 and 2), we wrote down the probability or concentration distribution of a bunch of Brownian diffusors initially at \(x = 0\) (delta function), \[p_{1D}(x, t) = \dfrac{1}{\sqrt{4 \pi Dt}} \exp\left(-\dfrac{x^2}{4Dt}\right)\]
The PDF is normalized on the domain \(x \in [-\infty, \infty]\) so that, \[\int_{-\infty}^{\infty} p_{1D}(x,t)\, dx = 1.\] In 2D, \(\langle r^2(t) \rangle = \langle x^2(t) \rangle + \langle y^2(t) \rangle\). If diffusion is isotropic, then \(\langle r^2(t) \rangle = 2Dt + 2Dt = 4Dt\). In this case,
\begin{align}
p_{2D}(r, t) & = p_{1D}(x, t) \, p_{1D}(y, t)\\
& = \dfrac{1}{\sqrt{4 \pi Dt}} \dfrac{1}{\sqrt{4 \pi Dt}} \exp\left(-\dfrac{1}{2} \dfrac{x^2+y^2}{2Dt}\right)\\
& =\dfrac{1}{4 \pi Dt} \exp\left(-\dfrac{r^2}{4Dt}\right)
\end{align}

The PDF is normalized such that, \[\int_{0}^{\infty} (2\pi r) \, p_{2D}(r,t)\, dr = 1.\]
Finally, for isotropic 3D diffusion, \[p_{3D}(r, t) = \left(\dfrac{1}{4 \pi Dt}\right)^{3/2} \exp\left(-\dfrac{r^2}{4Dt}\right).\] The PDF is normalized such that, \[\int_{0}^{\infty} (4\pi r^2) \, p_{3D}(r,t)\, dr = 1.\] In summary, for \(d\) = 1, 2, or 3 dimensions
\[p_{dD}(r, t) = \left(\dfrac{1}{4 \pi Dt}\right)^{d/2} \exp\left(-\dfrac{r^2}{4Dt}\right).\]

Tuesday, April 3, 2018

Diffusion and Random Walks

In the previous post, we saw how the probability distribution \(p(x,N)\) after \(N\) random steps on a unit lattice is given by, \[p(x, N) = \dfrac{1}{\sqrt{2 \pi N}} \exp\left(-\dfrac{x^2}{2N}\right)\] If the average step size is \(b\) instead of \(b=1\), then we can generalize, and write the formula as:
\[p(x, N) = \dfrac{1}{\sqrt{2 \pi Nb^2}} \exp\left(-\dfrac{x^2}{2Nb^2}\right)\] Now consider a Gaussian random walk in 1D. Suppose the stepsize at each step is drawn from a normal distribution \(\mathcal{N}(0, 1)\). While it has the same average stepsize as a walk on the lattice, an individual step may be shorter or longer than b=1.

In polymer physics, where a Gaussian coil is often used as a model for polymer conformations, \(b\) is called the Kuhn length, and \(N\) is proportional to the molecular weight.

Due to the connection between Brownian motion and random walks, the mean squared distance travelled by a particle in 1D with self-diffusivity \(D\) is \(\langle x^2(t) \rangle = 2Dt\). Similarly, the mean end-to-end squared distance of a Gaussian random walk is given by, \[\langle x^2(N) \rangle = \int_{-\infty}^{\infty} x^2 p(x, N) dx = Nb^2 \equiv 2Dt = \langle x^2(t) \rangle.\] This allows us to re-parameterize the equation for the position of a Brownian diffusors. \[p(x, t) = \dfrac{1}{\sqrt{4 \pi Dt}} \exp\left(-\dfrac{x^2}{4Dt}\right)\] Look at the correspondence between \(t\) and \(N\), and \(b\) and \(\sqrt{2D}\).

Tuesday, March 27, 2018

A Primer on Diffusion: Random Walks in 1D

Consider a particle, initially at the origin, jumping around randomly on a 1D lattice. The particle tosses a fair coin, and decides to jump left or right.

A particular trajectory of the particle may look like the following:


Suppose the particle makes \(n_{+}\) hops to the right, and \(n_{-}\) hops to the left. Then, the total number of steps \(N = n_{+} + n_{-}\), and the position at the end is \(x = n_{+} - n_{-}\).

The process is probabilistic, and the outcome of any single trajectory is impossible to predict. However, let us enumerate the number of ways in which a random walk of \(N\) steps, results in \(n_{+}\) hops to the right. This is given by, \begin{align*}
W(x, N) & = {}^N C_{n_{+}}\\
& =  \dfrac{N!}{N-n_{+}!n_{+}!}\\
& = \dfrac{N!}{n_{-}!n_{+}!}
\end{align*} The probability \(p(x, N)\) of ending up at \(x\) after \(N\) steps can be obtained by dividing \(W(x, N)\) by the total number of paths. Since we can make two potential choices at each step, the total number of paths is \(2^N\).
\[p(x, N) = \dfrac{W(x,N)}{2^N}.\]
For large \(N\), Stirling's approximation is \(N! \approx \sqrt{2 \pi N} (N/e)^N\). For \(x \ll N\), this implies, \[p(x, N) = \dfrac{1}{\sqrt{2 \pi N}} \exp\left(-\dfrac{x^2}{2N}\right)\]
Both the distributions have the same shape. However, because one is a discrete distribution, while the other is continuous, they have different normalizations, and hence different actual values of \(p(x,N)\).

Thursday, February 23, 2017

Split a Text File using csplit

Suppose you have a large "sectioned" text file which looks like the following

> cat bigfile.txt
data1
1
2
3

data2
11
22
33

...

csplit bigfile.txt /data/ {*}

splits it into a bunch of files xx0, xx1, xx2, ..., where each of the "xx" files holds a section. Thus,

> cat xx1
data1
1
2
3

and so on. Can be just the quick tool you need at times.

Friday, February 17, 2017

Google N-Grams for Keywords in Journals

Google Ngram Viewer allows us to visualize popularity of keywords as a function of time. It uses books archived on Google Books as its corpus.



However, it doesn't work quite as well on some domain specific scientific keywords.

The best tools for this is Web of Science, if you are lucky enough to have institutional access to it.

Once you search a keyword, scroll to the bottom left and look for the "Analyze Results" button. Then choose "Publication Years" in the "Rank the records by this field" window.

Here I looked for "tube model", which is a popular model in polymer melt dynamics.


You can export the result and play with it.

Wednesday, February 1, 2017

Neat Little Integral Trick

John D. Cook writes about a useful integration trick by rewriting trigonometric functions as complex variables. He recasts the integral \[\int e^{-x} \sin(4x) dx,\] using \(e^{ix} = \cos x + i \sin x\) as the imaginary part of \[\int e^{-x} e^{4ix} dx.\]

The derivation is cleaner (no integration by parts), and you don't have to remember any trig formulae. You can do pretty much any trig integral:\[\begin{align} \int \cos x dx & = \int e^{ix} dx \\
& = e^{ix}/i \\ & = -i e^{ix}. \end{align}\] The real part of the last expression is \(\sin x\).




Thursday, November 10, 2016

Virial Pressure and PBCs

The total energy of a system can be decomposed into kinetic and potential parts:
\[E = K + U.\] Using the thermodynamic relation, \[p = - \left(\dfrac{dE}{dV}\right)_T.\] This translates to, \[P = \dfrac{NkT}{V} - \left\langle\dfrac{dU}{dV}\right\rangle_T,\] where the angle brackets denote a statistical average. I will drop the subscript \(T\) for brevity from here on.

Molecular simulations are typically carried out in boxes. Let us assume a cubic box of volume \(V = L^3\), with \(N\) particles. Let the coordinates of the particles be \(\mathbf{r}^N = \{\mathbf{r}_1, \mathbf{r}_2, ..., \mathbf{r}_N,\}\).

People often like to use scaled coordinates:\[\mathbf{r}_i = \mathbf{s}_i L.\] The potential energy describes interactions of the particles, and can be thought of \(U(\mathbf{r}^N, L)\). The explicit dependence on \(L\) will become clear shortly.

Now one can write:
\[\dfrac{dU}{dV} = \dfrac{dL}{dV}\left[\dfrac{\partial U}{\partial L} + \sum_{i=1}^N \dfrac{\partial U}{\partial \mathbf{r}_i} \cdot  \dfrac{d\mathbf{r}_i}{dL} \right].\] Using \(V = L^3\) and \(d\mathbf{r}_i/dL = \mathbf{s}_i\), and identifying the force on particle \(i\) as \(f_i = -\partial U/\partial \mathbf{r}_i\), one can simplify this relation to obtain:

\[P = \dfrac{NkT}{V} + \left\langle \dfrac{1}{3V} \sum_{i=1}^N \mathbf{r}_i \cdot \mathbf{f}_i - \dfrac{1}{3L^2} \dfrac{\partial U}{\partial L} \right\rangle.\]

Finite Non-periodic Systems

For finite, non-periodic systems, the potential energy depends only on atomic coordinates, \(U(\mathbf{r}^N)\). Hence, the partial with respect to \(L\) is equal to zero, and we obtain the familiar form:

\[P = \dfrac{NkT}{V} + \left\langle \dfrac{1}{3V} \sum_{i=1}^N \mathbf{r}_i \cdot \mathbf{f}_i \right\rangle.\] It is easy to show that in \(d\) dimensions, \(V = L^d\), this equation can be generalized to: \[P = \dfrac{NkT}{V} + \left\langle \dfrac{1}{dV} \sum_{i=1}^N \mathbf{r}_i \cdot \mathbf{f}_i \right\rangle.\] For pairwise forces, \(\mathbf{f}_i = \sum_{j \neq i} \mathbf{f}_{ij}\), where \(\mathbf{f}_{ij} = - \mathbf{f}_{ji}\), this equation can be further simplified to:

\[P = \dfrac{NkT}{V} + \left\langle \dfrac{1}{6V} \sum_{i=1}^N \sum_{j \neq i}^N \mathbf{r}_{ij} \cdot \mathbf{f}_{ij} \right\rangle.\]

Periodic Systems

Very often, periodic boundary conditions are used in molecular simulations.



Thus, a given particle interacts not only with particles in the given box, but all the periodic images. This includes, by the way, images of itself in other periodic boxes.

One now has to contend with the entire expression, including the partial with respect to box length:
\[P = \dfrac{NkT}{V} + \left\langle \dfrac{1}{3V} \sum_{i=1}^N \mathbf{r}_i \cdot \mathbf{f}_i - \dfrac{1}{3L^2} \dfrac{\partial U}{\partial L} \right\rangle.\] Note that the dependence of \(U\) on \(L\) is real. If we expand the box, we have to choose whether we scale all the coordinates, or just expand the box, leaving the particle coordinates in the simulation box unchanged.


In either case, the interaction energy between particle \(i\) and periodic images is mediated by \(L\). Thus, ignoring this dependence makes virial pressure calculations for general (especially multibody) potentials incorrect!

However, for pairwise potentials (which constitute the bulk of simulations in the literature), it turns out that one can write the total potential energy as, \[U = \dfrac{1}{2} \sum_{i=1}^N \sum_{j \neq i}^{MN} u(r_{ij}),\] where \(M\) sums over the (potentially infinite) periodic boxes. Note that since we are summing over all the periodic boxes, we don't need an explicit dependence on \(L\) - the only dependence on \(L\) is implicit via scaling of \(\mathbf{r}_i\).

For affine deformation, in which particle coordinates are scaled, it turns out that we indeed recover the expression:

\[P = \dfrac{NkT}{V} + \left\langle \dfrac{1}{6V} \sum_{i=1}^N \sum_{j \neq i}^N \mathbf{r}_{ij} \cdot \mathbf{f}_{ij} \right\rangle.\]

This is extraordinarily fortunate!

References

1. Manuel J. Louwerse, Evert Jan Baerends, ``Calculation of pressure in case of periodic boundary conditions", Chemical Physics Letters 421, 138-141, 2006.

This paper, while not the first, is the most recent reminder that PBCs don't play well with multibody potentials.

2. Aidan P. Thompson, Steven J. Plimpton, and William Mattson, ``General formulation of pressure and stress tensor for arbitrary many-body interaction potentials under periodic boundary conditions",  Journal of Chemical Physics, 154107, 2009.

This paper offers a surprisingly straightforward solution for addressing PBCs and multibody potentials.

Wednesday, July 6, 2016

Force from Pairwise Potentials

Consider a pairwise potential, such as the Lennard Jones potential, \[U(r) = 4 \epsilon \left[ \left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6} \right].\]Question: How can we get the force (magnitude and force) from such a potential?

For simplicity let us assume that we have only two particles A and B. Specifically, what is the force on particle A due to particle B?

We know that the force is the negative gradient of the potential; therefore, \[\begin{align*}
\mathbf{f}_{AB} & = -\dfrac{dU(r_{AB})}{d\mathbf{r}_A} \\
& = -\dfrac{dU(r_{AB})}{dr_{AB}} \color{blue} {\dfrac{dr_{AB}}{d\mathbf{r}_A}}
\end{align*}\] How do we evaluate the term in blue?

Let \(\mathbf{r}_A = x_A \mathbf{e}_x + y_A \mathbf{e}_y + z_A \mathbf{e}_z\), and \(\mathbf{r}_B = x_B \mathbf{e}_x + y_B \mathbf{e}_y + z_B \mathbf{e}_z\) be the positions of the two particles. Let \(\mathbf{r}_{AB} = \mathbf{r}_B - \mathbf{r}_A\) be a vector pointing in a direction from A to B. The distance between the two particles is:
\[r_{AB} = \sqrt{x_{AB}^2 + y_{AB}^2 + z_{AB}^2},\] where \(x_{AB}^2 = (x_B - x_A)^2\), etc.

We can now try to tackle the blue term: \[\begin{align*}
\dfrac{dr_{AB}}{d\mathbf{r}_A} & = \dfrac{dr_{AB}}{dx_A} \mathbf{e}_x + \dfrac{dr_{AB}}{dy_A} \mathbf{e}_y + \dfrac{dr_{AB}}{dz_A} \mathbf{e}_z\\
& = \dfrac{1}{2 r_{AB}} \left(\dfrac{dx_{AB}^2}{dx_A} \mathbf{e}_x + \dfrac{dy_{AB}^2}{dy_A} \mathbf{e}_y + \dfrac{dz_{AB}^2}{dz_A} \mathbf{e}_z \right) \\
& = \dfrac{1}{2 r_{AB}} \left(-2 x_{AB} \dfrac{dx_A}{dx_A} \mathbf{e}_x - 2 y_{AB} \dfrac{dy_{A}}{dy_A} \mathbf{e}_y - 2 z_{AB} \dfrac{dz_{A}}{dz_A} \mathbf{e}_z \right) \\
& = -\dfrac{\mathbf{r}_{AB}}{r_{AB}} \\
& = -\hat{\mathbf{r}}_{AB}
\end{align*}\] Thus, the force is, \[\mathbf{f}_{AB}(r) = \dfrac{dU(r)}{dr} \hat{\mathbf{r}}_{AB}.\] If \(U^{\prime}(r)\) is negative (repulsive regime of LJ, for instance), the direction of the force is along \( -\hat{\mathbf{r}}_{AB} = \hat{\mathbf{r}}_{AB}\); this force points in the direction of A from B, trying to push particle A away from particle B. If \(U^{\prime}(r)\) is positive (attractive part of LJ), the force acts along \(\hat{\mathbf{r}}_{AB}\).

Thursday, February 11, 2016

Gerschgorin's Circle Theorem

Let \(\mathbf{A}\) be an \(n \times n\) matrix.

Define the disks (i.e., circles) in the complex plane, for \(i = 1, 2,...,n\), by
\[\mathcal{D}_i = \left\lbrace z : |a_{ii} - z | \leq \sum_{j \neq i} |a_{ij}| \right\rbrace\] Then all the eigenvalues of \(\mathbf{A}\) lie in the union of the disks
\[\bigcup_{i=1}^n \mathcal{D}_i\] Moreover, if \(k\) disks are disjoint then there are exactly \(k\) eigenvalues lying in the union of these \(k\) disks.

For a diagonal matrix, these discs coincide with the spectrum.

Example

The python program attached below can be used to visualize the Gerschgorin discs \(\mathcal{D}_i\) and the actual eigenvalues on a complex plane.

A = np.matrix('5. 3. 2.;4., 6., 5.; -3., 1., 4.')
demoGerschgorin(A)

produces the plot:

The discs are centered around the diagonal elements (5,0), (6, 0), and (4, 0) with radius 5, 9, and 4 respectively. Since the matrix is real, the centers all line up on the real axis.

The eigenvalues of this matrix are approximately 0.92,  7.04+0.70j, and 7.04-0.70j, which are shown by the blue, green, and red dots respectively.

Python Program

Thursday, January 28, 2016

Ornstein-Zernike Equation

I've been teaching myself some liquid state theory for a coarse-graining problem I am interested in.
  1. The classic "Theory of Simple Liquids" by Hansen and McDonald is of course a great textbook.
  2. Another freely available introduction, especially focused on the Ornstein-Zernike equation and its closures, is this readable chapter by Nagele called "Theories of Fluid Microstructure" (pdf). This presentation, which closely follows the material in the chapter, is also useful as a review.
  3. Finally, I found a very nice description of the equation, closures, review of numerical techniques to solve the OZ equation, and (wait I am not done) - python program - called pyOZ - to solve the OZ equations. While it is no longer actively maintained (last release was 2009), it is a very handy resource. Thank you Lubos Vrbka!
  4. There is also a Fortran 90 OZ equation solver that is maintained as part of a larger project here.
  5. Per Linse has a better documented OZ Fortran 95 software.
  6. A reasonably straightforward explanation of the computational method used is described here.

Wednesday, July 1, 2015

Random Number Generators

In preparation for a language and platform agnostic course on Markov Chain Monte Carlo, I compiled links to random number generators (in addition to uniform) for a bunch of different platforms.
  • Python: Numpy by itself has a formidable list of distributions it can sample from. The scipy.stats module add even more firepower by increasing not only the number of standard distributions you can sample from, but also being able to do neat things like plotting the PDF, CDF, etc.
  • GNU Octave: A fairly extensive list that contains most of the usual suspects comes standard. The "Statistics" package at OctaveForge adds to this set, and like the scipy.stats module lets you do more with standard distributions.
  • Matlab: Core Matlab has only the barebones RNG capability - essentially uniform and normal distributions. You can enhance it by purchasing the Statistics and Machine Learning Toolbox. Also see John's implementation of RANLIB for Matlab below.
For compiled languages, my colleague John Burkardt has a implementations of RANLIB/RNGLIB which allow you to sample from "Beta, Chi-square Exponential, F, Gamma, Multivariate normal, Noncentral chi-square, Noncentral F, Univariate normal, random permutations, Real uniform, Binomial, Negative Binomial, Multinomial, Poisson and Integer uniform"


Thursday, May 21, 2015

SymPy Cheat Sheet

Finally, I decided to compile a set of commonly used (by me) SymPy commands.

Since Blogger's platform is somewhat unfriendly, I'm sharing it via:

(i) an iPython notebook @ nbviewer
(ii) a PDF document

Hopefully they work. Enjoy.

Monday, May 4, 2015

Subplots in Veusz

Continuing from the previous post, let us look at how to do subplots in Veusz. It's elementary, really!

In this tutorial, I imported the dataset near the end. You could, of course, load it in anytime.

1. By default, there is a single graph "graph1" under "page1". Let us first get rid of it.


2. Insert a grid in its place. Specify the number of rows and columns. By default it is 2 by 2. Here I want one graph stacked on top of another, so I choose two rows and one column.


3. Under "grid1", insert two graphs, by clicking on the graph icon twice. The graphs go into their corresponding places.


4. Choosing one graph at a time, insert the type of plot (here "xy"), and decorate it, as you will.


Tuesday, April 7, 2015

Inset Plots in Veusz

Over the past six months, I've pretty much completed the transition from Grace to Veusz, as my primary graphing software, for journal-quality plots.

As I have grown familiar with the software, I've come to realize that it is powerful, has much better documentation and support on the web (which is somewhat surprising since Veusz is just as specialized, and much newer).

Anyway, thanks to Google Analytics, I know that this post on "how to create inset plots or subplots with Grace" is quite popular. I thought I ought do a short-series where I repeat how to do the same common stuff with Veusz.

Here is the first installation. It discusses how to do inset plots.

Really, it is a cinch!

1. Load in the appropriate datafile. Here my datafile has three columns. Click on the images if you want to see a larger version.


2. Choose a scatter plot, and plot one of the datasets (here x v/s z).


3. Add another graph by clicking the graph icon.


4. You can right-click on "graph1" and choose to "hide" or "show" it. Let's hide it for now.


5. With graph1 out of the way, resize, move, and adjust graph2.


6. "Show" graph1 by right-clicking. Notice that graph2 might be under graph1 and hence invisible. Right click on graph2 and move it up or down.


7. That's it. You can now go in and decorate the two graphs to your heart's content.



Tuesday, February 3, 2015

Computing Weighted Averages With Tiny Weights

Consider a set of n data-points \(\{ (w_1, R_1), (w_2, R_2), ..., (w_n, R_n) \}\), where w is the weight, and R is the observed data. We can compute a weighted mean by, \[\bar{R} = \frac{\sum\limits_{i=1}^N w_i R_i}{\sum\limits_{i=1}^N w_i}.\]

Suppose that the weights were super-tiny so that instead of the actual weights, you kept track of the logarithm of the weights \(L_i = \log w_i\). If the weights are small and disparate, a naive computer implementation of the formula, \[\bar{R} = \frac{\sum\limits_{i=1}^N e^{L_i} R_i}{\sum\limits_{i=1}^N e^{L_i}},\] would run into trouble.

This can be illustrated by a simple python program:

import numpy as np
L = np.array([-1500, -999, -1001, -1002], float);
R = np.array([800, 50, 100, 66], float);

sumNumerator   = sum(np.exp(L)*R)
sumDenominator = sum(np.exp(L))

print sumNumerator, sumDenominator

This runs into underflow problems and yields the output:

0.0 0.0

We note that the numerator and denominator can be multiplied by any common number without changing the ratio. If all the data is available, then this observation informs a simple fix. We can rescale the logWeights by subtracting out a common scale factor, \(L_{i}^{'} = L_i - L_s\), since \[\bar{R} = \frac{e^{L_s} \sum\limits_{i=1}^N e^{L_{i}^{'}} R_i}{e^{L_s} \sum\limits_{i=1}^N e^{L_i^{'}}}.\] Continuing the python example from above, and choose maximum of the logWeights as the scale factor :

Lscale = max(L)
Lp = L - Ls;
sumNumerator   = sum(np.exp(Lp)*R)
sumDenominator = sum(np.exp(Lp))

print sumNumerator, sumDenominator

#
# If denominator not too small
#
if sumDenominator > np.finfo(float).eps:
weightAverage = sumNumerator/sumDenominator
print "Scaled Weighted Average", weightAverage

yields the output,

66.8194748359 1.1851223516
Scaled Weighted Average 56.3819210274

Suppose, the data were not all available at once, but trickled in one datapoint after another. The principal complication in this case is that we don't know all the weights a priori (note: if we did, then this would be a simpler problem). We could start by setting the scale factor using the first data-point, and then appropriately weighting or de-weighting, when a better (larger) candidate for the scale factor is found. This Python function does the trick:

Tuesday, October 14, 2014

An underflow problem in slice sampling

In a MCMC seminar that Peter and I run, we were recently discussing slice sampling. Here is an overview from Radford Neal himself (open access). Let me skip the details of the algorithm, and pose the particular problem we encountered.

Problem: Compute the energy \(U^*\) of a system. We don't have any idea of the magnitude of this energy, or its sign. Suppose the probability of the system to be in that state is \(\exp(-U^*)\).

This number is going to be extremely small if \(U^*\) is large (which is probable, if you don't work hard at estimating a good initial guess).

The slice sampling algorithm asks us to draw a number \[y \sim U(0, \exp(-U^*)),\] and compute the difference \[\Delta U_y = -\log(y) - U^*.\]
Straightforward Implementation:

The following Octave program computes 10,000 samples of \(\Delta U_y\) to produce the histogram

%
% Small Ustar = 10, no problems
%
Us  = 10;
num = 10000;
y   = rand(num,1) * exp(-Us);
Uy  = -log(y);
dUy = Uy - Us;
[hx, x] = hist(dUy,25);
hx = hx/num;
bar(x, hx)
Histrogram of dUy with U*=10 is an exponential distribution
Unfortunately if I set Us = 1000 in the program above, it fails. The failure is simple to parse. If Us is large, exp(-Us) is below "realmin" in double precision, and is approximated to zero, which spawns a cascade of disastrous events, ending in the logarithm of zero.

There are multiple ways of fixing this problem.

Solution 1:

Since we are really interested in \(\Delta U_y\), we can draw it directly from the correct distribution. In this case, a little analysis suggests that the distribution of \(\Delta U_y\) is an exponential distribution with parameter = 1. Hence, we can skip the auxillary variables \(y\) and \(U_y\) and go directly to the meat of the problem:

Us  = 1000;
num = 10000;
dUy = exprnd(1,num,1);

Solution 2:

Since we observe that \(U^*\) doesn't participate in the distribution, we can think of another method that does not directly involve sampling from an exponential random number generator. We define \(y' \sim U(0,1),\) and \(y = y' * \exp(-U^*)\).

We can write \(-\log(y) = -\log(y') + U^*\).

This implies  that  \(\Delta U_y = -\log(y) - U^* = -\log y'\).

Thus,

yp  = rand(num,1);
dUy = -log(yp);

also works well. The advantage is that one doesn't need an special exponential random number generator.

On my computer, solution 2 was more efficient than solution 1 for num < 50000, after which solution 1 became more efficient.

Saturday, September 14, 2013

Molecular Weight Distribution Posts

I wanted to put together four posts I did on molecular weight distributions (MWD) just for future reference:
  1. The relationship between size-exclusion chromatography (SEC) and MWD: The first post explored the relationship between the typical SEC plot (w(log M) v/s log M.
  2. The second post looked at a particular example to "particularize" some of the equations above.
  3. I also looked at two common parametric forms for MWD, namely the log-normal distribution, and the generalized exponential or GEX functions.


Wednesday, June 26, 2013

Vector Representation in Different Basis

Teacher: We now know how to rotate a vector \(u\) counter-clockwise by an angle \(\theta\) using a rotation matrix \(Q\). In this lesson, we are not going to transform the vector \(u\) - instead we are going to investigate how the matrix representation changes when we move from the standard basis vectors to  some other basis.

Student: Right, that last time around you did remark that in the matrix representation of a vector \(u = (1,1)\) the basis was tacitly assumed. So I guess, we have to first talk about a new basis.

Teacher
: Yep. Let's again consider a vector \(u = (u_1, u_2)\) and the standard basis vectors \(e_1 = (1,0)\) and \(e_2 = (0,1)\) in 2D. Thus,
\[u = u_1 e_1 + u_2 e_2 = \begin{bmatrix} u_1 \\ u_2 \end{bmatrix}.\]
Student: I notice you dropped the subscript "o" from last time, because we are not going to touch the vector per se. Also I notice that we using a general representation instead of something specific \(u = (u_1, u_2)\).

Teacher
: Good observation. We will toggle between a general and a specific \(u\), depending on the situation. Now let's pick two new basis vector \(E_1\) and \(E_2\). Just to reiterate lessons from last time, let us generate this new basis by rotating the standard basis by 90 degrees.

Student: Okay. Let me figure this part out. I set  \(\theta = \pi/2\). I can compute \(Q(\pi/2)\) and get:
\[E_1 =Q(\pi/2) e_1 = \begin{bmatrix} 0 & -1\\ 1 & 0 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} .\]

Similarly, \(E_2 = Q(\pi/2) e_2 = (-1, 0)\). I guess that means I can draw a picture such as:


Teacher: Great. Now let's consider the point \(u = (1,1)\) as before, and ask ourselves how its representation in the new basis \(U = (U_1, U_2)\) looks like.

Student: Hang on. I thought we were not going to do anything to the vector \(u\).

Teacher
: We aren't! We are simply looking at the same geometrical object \(u\) with a different lens (basis). It similar to saying: "Texas is to the west", when you are in Florida, and "Texas is to the East", when you are in California. Texas hasn't moved. You have.

Student: Aha! I see what you mean. We are trying to represent the same geometrical object \(u_1 e_1 + u_2 e_2 = U_1 E_1 + U_2 E_2\). 

Teacher
: Great. Since we know the relationship between the old and new basis, we should be able to figure out the co-ordinates in the new basis.
\[ U_1 E_1 + U_2 E_2 = U_1 Q e_1 + U_2 Q e_2 = Q U_1 e_1 + Q U_2 e_2\]
That is: \(QU = u\), or \(U = Q^T u\).

Student: Okay let me try to keep things straight. \(Q\) codifies the relationship between the old and new axis. You are telling me that a point \(u\) in the old basis is the same as the point \(Q^T u\) in the new basis.

Teacher
: Yes! Why don't you try out the example?

Student: Okay. I already know what \(Q\) is. I can get:
\[U = Q^T u =  \begin{bmatrix} 0 & 1\\ -1 & 0 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix} .\]

Teacher
: Does that look like the right answer?

Student: \(U = 1 E_1 + (-1) E_2 \). That looks about right from this figure!


Teacher
: Right. If you want to rotate a vector you multiply it by \(Q\). If you want to represent the same vector in a different basis you multiply it by \(Q^T\).

Student: Question. Here the new basis was a simple rotation of the old basis given by \(Q\). What about a general new basis.

Teacher
: Good question. We can figure this out using linear algebra for a n-dimensional space. We know what the standard basis for such a space is. \(e_i = [0~... 0, 1, 0, ..., 0]^T\), where the 1 is in the i-th row. Consider an alternative basis packed together as columns in the matrix C:
\[C = (v_1~v_2~...v_n)\]
Student: Each of the \(v_i\) is a n-dimensional vector, and since they form a basis for \(R^n\), they are linearly independent. And \(C\) is really a n by n matrix, that is invertible.

Teacher
: Someone here remembers their linear algebra! Okay now we can write a point \(u \in R^n\) in terms of the new basis
\[u = U_1 v_1 + ... + U_n v_n = C \begin{bmatrix} U_1 \\ \vdots \\ U_n \end{bmatrix} = C U\]
And hence, \(U = C^{-1} u\).

Student: I get it! In our case, the the basis transformation was simply a rotation \(C = Q\). Therefore
\(U =  C^{-1} u = Q^T u\). That really does tie it all in together. 

Teacher
: I am so glad that it does. You know what else. Khan Academy has a bunch of nice videos on this topic.

Student: I sure will check them out.