Since TeX is really a markup language, counting the number of words in a document is tricky. Obviously, you don't want to literally count words in tags like \chapter{}, \begin{center}, \cite{reference1, reference2} etc.
You may have macros, which need interpretation.
You may have external files that you are collecting together in a master document by using \input{} etc.
In short, it is not as simple as it seems.
You could try to use front-end programs like kile or TeXShop which will give you a simple total count. My front-end program of choice --- TeXMaker --- does not do it for me.
If your document is very simple, you could try to "detex" the LaTeX tags, and use a simple Linux utility like "wc".
The best solution seems to be TeXcount.
There is a web-interface that lets you paste your TeX document in a web-form.
Alternatively you can download the script. It is essentially a small perl program (400kB download in all, the actual script is about 90kB) called texcount.pl, which you can run quite simply as
perl texcount.pl filename.tex
Here's the form (default) output it spits out
Encoding: ascii
Words in text: 10324
Words in headers: 81
Words in float captions: 219
Number of headers: 30
Number of floats: 5
Number of math inlines: 198
Number of math displayed: 18
Subcounts:
text+headers+captions (#headers/#floats/#inlines/#displayed)
14+9+0 (1/0/0/0) _top_
89+1+0 (1/0/0/0) Section: Introduction
419+2+44 (1/1/0/0) Subsection: Analytical Rheology
646+2+38 (2/1/3/0) Subsection: Polymers
236+3+0 (1/0/0/0) Subsection: Scope and Organization
35+3+0 (1/0/0/0) Section: Motivation and Background
355+2+19 (1/0/2/0) Subsection: Linear Polymers
657+2+24 (1/1/17/1) Subsection: Branched Polymers
205+3+45 (1/1/0/0) Subsection: Model-driven Analytical Rheology
97+6+0 (1/0/0/0) Section: Models for Polymer Dynamics and Rheology
597+2+0 (1/0/2/0) Subsection: Historical Development
872+5+25 (2/1/8/0) Subsection: The Tube Model
211+4+0 (1/0/0/0) Subsection: State of the Art
273+2+0 (1/0/5/0) Subsection: Computational Models
162+3+0 (1/0/0/0) Section: Methods and Progress
1774+5+0 (3/0/108/15) Subsection: Linear Polymers
2955+24+24 (9/0/51/2) Subsection: Branched Polymers
727+3+0 (1/0/2/0) Section: Summary and Perspective
You can exercise significant control over the way it parses the document and reports the results by using options that are described in the manual.
You may have macros, which need interpretation.
You may have external files that you are collecting together in a master document by using \input{} etc.
In short, it is not as simple as it seems.
You could try to use front-end programs like kile or TeXShop which will give you a simple total count. My front-end program of choice --- TeXMaker --- does not do it for me.
If your document is very simple, you could try to "detex" the LaTeX tags, and use a simple Linux utility like "wc".
The best solution seems to be TeXcount.
There is a web-interface that lets you paste your TeX document in a web-form.
Alternatively you can download the script. It is essentially a small perl program (400kB download in all, the actual script is about 90kB) called texcount.pl, which you can run quite simply as
perl texcount.pl filename.tex
Here's the form (default) output it spits out
Encoding: ascii
Words in text: 10324
Words in headers: 81
Words in float captions: 219
Number of headers: 30
Number of floats: 5
Number of math inlines: 198
Number of math displayed: 18
Subcounts:
text+headers+captions (#headers/#floats/#inlines/#displayed)
14+9+0 (1/0/0/0) _top_
89+1+0 (1/0/0/0) Section: Introduction
419+2+44 (1/1/0/0) Subsection: Analytical Rheology
646+2+38 (2/1/3/0) Subsection: Polymers
236+3+0 (1/0/0/0) Subsection: Scope and Organization
35+3+0 (1/0/0/0) Section: Motivation and Background
355+2+19 (1/0/2/0) Subsection: Linear Polymers
657+2+24 (1/1/17/1) Subsection: Branched Polymers
205+3+45 (1/1/0/0) Subsection: Model-driven Analytical Rheology
97+6+0 (1/0/0/0) Section: Models for Polymer Dynamics and Rheology
597+2+0 (1/0/2/0) Subsection: Historical Development
872+5+25 (2/1/8/0) Subsection: The Tube Model
211+4+0 (1/0/0/0) Subsection: State of the Art
273+2+0 (1/0/5/0) Subsection: Computational Models
162+3+0 (1/0/0/0) Section: Methods and Progress
1774+5+0 (3/0/108/15) Subsection: Linear Polymers
2955+24+24 (9/0/51/2) Subsection: Branched Polymers
727+3+0 (1/0/2/0) Section: Summary and Perspective
You can exercise significant control over the way it parses the document and reports the results by using options that are described in the manual.
No comments:
Post a Comment