Thursday, February 22, 2024

Large PDFs with Matplotlib

Vector graphics (SVG/PDF) outputs of scatterplots with thousands of points lead to bloated files, unlike say raster formats like PNG. This makes scrolling PDF documents that include such bloated files a painful affair.

The reason is fairly obvious: vector files scale with the number of data-points, while raster files scale with the number of pixels.

There are many potential solutions. The simplest is to rasterize only the large dataset of scatter points using the rasterized=True flag. Thus,

plt.plot(x, y, 'o', alpha=0.1, rasterized=True)

The resulting PDF is much lighter.