Tuesday, August 20, 2024

LaTeX and arxiv

Posting a LaTeX manuscript on arxiv is straightforward.

  1. Compile your document (say, main.tex). It is okay to leave all your figures in a "figs/" subfolder. Unlike some outlets, you don't have to flatten your directory structure.
  2. Apart from main.tex, main.bbl [bibliography], and figures, you may delete all other files. You need the bbl file because arxiv does not run bibtex.
  3. Zip the folder, and upload on arxiv.
If you have a supplementary information document (say, si.tex) and you use the "xr" package to cross-reference between main.tex and si.tex, then a few extra steps are required.

arxiv compiles all tex files in the zipped folder in alphabetic order. So it is important that "main.tex" appears before "si.tex", in case your tex files have different labels. 
  1. Compile main.tex and si.tex several times on your machine, so that inter-document cross-references work as desired.
  2. Apart from main.tex, si.tex, main.bbl, si.bbl, main.aux, si.aux, and figures, you may delete all other files. [xr uses main.aux and si.aux.]
  3. Relabel main.tex to main_ren.tex, main.bbl to main_ren.bbl, si.tex to si_ren.tex, and si.bbl to si_ren.bbl. Do not relabel *.aux files. Do not compile.
  4. Zip the folder, and upload on arxiv.

Thursday, February 22, 2024

Large PDFs with Matplotlib

Vector graphics (SVG/PDF) outputs of scatterplots with thousands of points lead to bloated files, unlike say raster formats like PNG. This makes scrolling PDF documents that include such bloated files a painful affair.

The reason is fairly obvious: vector files scale with the number of data-points, while raster files scale with the number of pixels.

There are many potential solutions. The simplest is to rasterize only the large dataset of scatter points using the rasterized=True flag. Thus,

plt.plot(x, y, 'o', alpha=0.1, rasterized=True)

The resulting PDF is much lighter.

Friday, September 8, 2023

Merging BibTeX bibliography files

Suppose you want to merge two bib files (f1.bib and f2.bib) that have considerable overlap. One easy solution using Jabref works as described below.

Suppose the target bibliography file without duplicates is merge.bib.

1. Copy f1.bib to merge.bib [cp f1.bib merge.bib]

2. Open merge.bib with Jabref

3. Then click File > Import into current database and select the other file [f2.bib]

4. You get a dialog box which allows you to manually decide what entries/versions you want to retain. If both f1.bib and f2.bib are of comparable quality, you can select "Deselect all duplicates" which automatically unselects duplicated entries.

5. Hit "OK" and save the modfied database [Ctrl-S]

Sunday, July 9, 2023

Two useful Matplotlib utilities

 1. Latexify_py

latexify is a Python package to compile a fragment of Python source code to a corresponding expression.



Pylustrator offers an interactive interface to find the best way to present your data in a figure for publication. Added formatting an styling can be saved by automatically generated code. To compose multiple figures to panels, pylustrator can compose different subfigures to a single figure.

See Youtube demo.


Tuesday, November 8, 2022

LaTeX to Word

Often I have a document in LaTeX, and somebody else needs an editable copy in Word. Here is a list of hacks I have learnt to use:

1. If the document is relatively free of math and figures then the simplest course is often to compile a PDF, and "import" the PDF into MS Word. This works out remarkably well in many cases.

2. The same thing above applies to figures. You can now directly drop PDF images into a Word doc.

3. If you have lots of equations, then it is worthwhile to use pandoc

pandoc mydoc.tex -o mydoc.docx

More sophisticated options to copy cross-references, and bibliography exist. See this as well.

4. Many journals accept PDF figures. If they need TIFF, then you can use Adobe Acrobat online to do this conversion. In my experience, this produces smaller files compared to other automatic converters including ImageMagick.

Wednesday, August 17, 2022

Recursively Clean LaTeX Debris in all Sub-Folders

 Often, I have a big folder like Lectures/ which may have sub-folders based on topics, and each topic might have additional folders. To clean auxillary LaTeX files in one fell swoop use,

find ./ \( -iname "*.bbl" -o -iname "*.aux" -o -iname "*.log" -o -iname "*.blg" -o -iname "*.nav" -o -iname "*.snm" -o -iname "*.toc" -o -iname "*.vrb" -o -iname "*.out" -o -iname "*.synctex.gz" -o -iname _minted*" \) -delete


Monday, July 18, 2022

RegEx Help

This ML based regex generator is quite handy! 

https://www.autoregex.xyz/home

Wednesday, June 22, 2022

Lectures on Graphical Models

Christopher Bishop has an excellent set (1, 2, and 3) of introductory lectures on "Probabilistic Graphical Models". They are well-motivated and cover topics that include:

  • directed and undirected graphs
  • conditional independence
  • factor graphs
  • inference using factor graphs and sum/product rules

Tuesday, March 22, 2022

QuickTip: Extracting pages from PDF on Linux

On a Mac OSX system, the default app Preview allows you to cut and paste pages from a PDF.

On Linux you can use PDFChain to manipulate PDFs. If you simply want to extract a certain range, then qpdf is quite handy.

A CLI solution is to use ghostscript as described here:

gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER \

       -dFirstPage=1 -dLastPage=15 -sOutputFile=outfile.pdf inpfile.pdf

You can make the interface friendlier by saving a function in your bashrc as described in the article.


Friday, April 2, 2021

Matplotlib: Lines Connecting Points and Boxes

This gist has python functions that help Matplotlib draw lines connecting points, and to draw boxes.

def drawBox(xlim, ylim):
    pts = [[xlim[0], ylim[0]], [xlim[1], ylim[0]], 
           [xlim[1], ylim[1]], [xlim[0], ylim[1]], 
           [xlim[0], ylim[0]]]
    x, y = zip(*pts)
    return x, y

def connectPoints(pts):
    x, y = zip(*pts)
    return x, y

Tuesday, March 30, 2021

Quicktip: Batch convert LibreOffice documents to PDF

To convert all the DOCX files the current working directory to PDF

 lowriter --headless --convert-to pdf *.docx

Similarly, to convert ODT files,

 lowriter --headless --convert-to pdf *.docx

Wednesday, February 3, 2021

QuickTip: LaTeX multiline equations with explanations

Sometimes you want to write a sequence of steps, and write the explanation for each step next to it.

abc = xyz    pythagoras rule

    = uvw    triangle inequality

    = ABC    

It is easy to do this with the amsmath package as detailed in this StackOverflow question.
\usepackage{amsmath}

\begin{align*}
abc &= xyz \\
    &= uvw && \text{pythagoras rule} \\
    &= D   && \text{triangle inequality} \\
    &= ABC && 
\end{align*}



Monday, December 7, 2020

Smooth Transition Between Functions

 Stitching together two functions is sometimes required as a way to transition from one dependence to another. The following schematic describes the idea pictorially:


Two different approaches are considered in this PDF (or this Jupyter Notebook).


Monday, October 26, 2020

Trapezoidal rule in log-log space

Consider the problem described in this StackOverFlow post. You have a function with certain smoothness properties that are apparent on a log-log plot. This is often accompanied by a large domain of integration. It seems worthwhile to "integrate in logspace", whatever that means. 

This Jupyter notebook probes this question and makes some recommendations.

Wednesday, May 20, 2020

Quicktip: Reindent Python Scripts

Suppose part of a python file uses spaces for indentation, while another part uses tabs. This will throw up exceptions at runtime. So the question is how to fix it.

One answer is to use the python script reindent.py. Stick it in some folder (~/bin/) in the default path and make it executable (chmod +x reindent.py).

The usage is straightforward:

reindent -n file.py

modifies the original file in place.

Sunday, May 17, 2020

Matplotlib: Saving TIFF and JPG formats

With pillow installed, on my LinuxMint installation:

import matplotlib
matplotlib.use('TkAgg') # backend

x = np.linspace(0,1)
plt.plot(x, x**2)
plt.savefig('test.tiff', dpi=300, fmt="tiff", pil_kwargs={"compression": "tiff_lzw"})


Monday, January 20, 2020

QuickTip: Catching array bounds violations in Fortran 90

With gfortran, you can check if array bounds are violated during runtime by,

gfortran -fbounds-check myProg.f90



Friday, October 18, 2019

LaTeX: Cross-referencing between Different Documents

Problem: I have a manuscript TeX file (main.tex), and an independent supporting information file (si.tex). I was to cross-reference (using \label and \ref) items across the two files.

For example, I might want to reference figure 1 from si.tex in main.tex.

Solution: As this SO answer suggests, the answer lies in the CTAN package xr.

In main.tex, just include "si.tex" as an external documents, and all its labels become visible!

\usepackage{xr}
\externaldocument{si}

Thursday, October 17, 2019

Parameter Uncertainty in Numpy Polyfit

Say you want to fit a line to (x,y) data. With polyfit, you can say,

coeff = np.polyfit(x, y, 1)

With numpy 1.7 and greater, you can also request the estimated covariance matrix,

coeff, cov = np.polyfit(x, y, 1, cov=True)

The standard error on the parameters is the square-root of the diagonal elements

print(np.sqrt(np.diag(cov)))

This report referenced in the SO page is quite useful!

Monday, September 30, 2019

Learning Gaussian Processes

I've been studying up Gaussian process modeling for machine learning.

For someone seeing these concepts for the first time, I would recommend the following sequence based on my experience:

1. A Visual Exploration of Gaussian Processes

It hits the key points of what makes multinormal distributions special (conditionals and marginals are normal too!), and the visuals help build intuition.

1a. Gaussian Processes for Dummies

You might not need this, but I like this essay because it is jargon-free, and focuses on how to get things going. There is python code at the end, which you can play with.

2. Chapter 2 of Gaussian Process for Machine Learning

This "bible" is astonishingly well-written. If you are familiar with linear algebra and some statistics, this is a breezy read. Plus, all the important formulae and algorithms you see in different articles, are available here in one place!

3. If you like videos, then this YouTube lecture might be worth watching!