Wednesday, June 15, 2016

The Savitzky-Golay Filter

In 1964, Abraham Savitzky and Marcel Golay published a paper "Smoothing and differentiation of data by simplified least squares procedures" in Analytical Chemistry, which has been heralded as one of the 10 most influential papers in the journal's history.

The Savitzky-Golay filter (SGF) is a digital filter used to smooth noisy data. The basic idea is to chop the dataset into subsets, and then use a low order polynomial to fit successive subsets.

Implementations are available in Octave/Matlab and in recent versions (>0.16) of scipy for python.

Here is a potential use case. I did a Lennard-Jones melt simulation using LAMMPS, and obtained the following pair correlation function g(r) [click to enlarge].

If you look closely, there is a fair amount of noise due to binning.
Let us use SciPy to smooth the noise.

from scipy.signal import savgol_filter
r, gr = np.loadtxt('gr.dat', unpack=True) # read file from disk
gsm = savgol_filter(gr, 15, 4) # smooth it

The second argument (15) has to be an odd number and is the window or subset size, and the third (4) argument is the degree of polynomial to regress. When I plot the smoothed curve:

plt.plot(rFG,grFG,'.')
plt.plot(rFG,gsm1,label='SG')

If you look closely, again:

One can experiment with the window size and the degree of polynomial. In general, a larger window, and a higher degree polynomial make the curve smoother. The figure below shows a window of size 7 and 31 with a degree 4 polynomial.


1 comment:

Unknown said...

Hello,
I want your help regarding xmgrace rdf plot.