Wednesday, February 2, 2011

Engauge Digitizer: Extract Freely

Engauge Digitizer is an "open source, digitizing software which converts an image file showing a graph or map, into numbers. The image file can come from a scanner, digital camera or screenshot." If you are using Linux and have a pdf of the figure you want to digitize, you can scan it in from the command line by using

import pic.png

and defining the picture bounds.
  1. Start Engauge.
  2. "Import" the PNG picture.
  3. Define the axes (log or linear) using three points.
  4. Scan in points or lines, either manually or automatically.

Once you have selected all the points you need, you can "export" the file into a "data.csv" file.

Typically this file looks something like:

x,Curve5
21.4764,59.9803
26.2048,92.1999
31.9743,163.566
45.5445,319.264
58.0845,527.22
...

To remove the first line, and to replace the commas with spaces/tabs and do other superficial dressing up, we use a simple terminal based command:

more +2 data.csv | awk -F, '{printf("%e\t%e\n",$1,$2)}'

The +2 ignores the first line, and the -F, specifies that the field separator is a comma. This immediately transfroms data.csv into a new file which looks like:

2.147640e+01    5.998030e+01
2.620480e+01    9.219990e+01
3.197430e+01    1.635660e+02
4.554450e+01    3.192640e+02
5.808450e+01    5.272200e+02

...

Done!

This is especially useful, if you are extracting a bunch of data from the same graph, and then cleaning them up at once.

PS: I used to use a Mac-based program called DataThief to do similar stuff in the past. I notice that it has since become shareware.


No comments: