pocount

pocount will count the number of strings and words in translatable files.

Supported formates include: PO and XLIFF. Almost all bilingual file formats supported by the Translate Toolkit will work with pocount, including: TMX, TBX, Gettext .mo, Qt .qm, Wordfast .txt TM.

A number of other formats should be countable as the toolkit develops. Note that only multilingual formats based the storage base class are supported, but that includes almost all storage formats.

Usage

pocount [options] <directory|file(s)>

Where:

directory

will recurse and count all files in the specified directory

file(s)

will count all files specified

Options:

-h, --help

show this help message and exit

--incomplete

skip 100% translated files

Output format:

--full

(default) statistics in full, verbose format

--csv

statistics in CSV format

--short

same as –short-strings

--short-strings

statistics of strings in short format – one line per file

--short-words

statistics of words in short format – one line per file

Examples

pocount makes it easy to count the current state of a body of translations. The most interesting options are those that adjust the output style and decide what to count.

Easy counting

To count how much work is to be done in you project:

pocount project/

This will count all translatable files found in the directory project/ and output the results in --full format.

You might want to be more specific and only count certain files:

pocount *.po

This will count all PO files in the current directory but will ignore any other files that ‘pocount’ can count.

You can have full control of the files to count by using some of the abilities of the Unix commandline, these may work on Mac OS X but are unlikely to work on Windows.:

pocount $(find . -name "*.properties.po")

This will first find all files that match *.properties.po and then count them. That would make it easy to count the state of your Mozilla translations of .properties files.

Incomplete work

To count what still needs to be done, ignoring what is 100% complete you can use the --incomplete option.:

pocount --incomplete --short *.xlf

We are now counting all XLIFF files by using the *.xlf expansion. We are only counting files that are not 100% complete and we’re outputting string counts using the --short option.

Output formats

The output options provide the following types of output

–full

This is the normal, or default, mode. It produces the most comprehensive and easy to read data, although the amount of data may overwhelm the user. It produces the following output:

avmedia/source/viewer.po
type              strings      words (source)    words (translation)
translated:   73465 ( 99%)     538598 ( 99%)          513296
fuzzy:           13 (  0%)        141 (  0%)             n/a
untranslated:    53 (  0%)        602 (  0%)             n/a
Total:        73531            539341                 513296

A grand total and file count is provided if the number of files is greater than one.

–csv

This format is useful if you want to reuse the data in a spreadsheet. In CSV mode the following output is shown:

Filename, Translated Messages, Translated Source Words, Translated Target Words, Fuzzy Messages, Fuzzy Source Words, Untranslated Messages, Untranslated Source Words, Review Messages, Review Source Words
avmedia/source/viewer.po,  1, 3, 3, 0, 0, 4, 22, 1, 3

Totals are not provided in CSV mode.

–short-strings (alias –short)

The focus is on easily accessible data in a compact form. This will only count strings and uses a short syntax to make it easy for an experienced localiser to read.:

test-po/fuzzy.po strings: total: 1    | 0t    1f      0u      | 0%t   100%f   0%u

The filename is followed by a word indicating the type of count, here we are counting strings. The total give the total string count. While the letters t, f and u represent ‘translated’, ‘fuzzy’ and ‘untranslated’ and here indicate the string counts for each of those categories. The counts are followed by a percentage representation of the same categories.

–short-words

The output is very similar to --short-strings above:

test-po/fuzzy.po source words: total: 3       | 0t    3f      0u      | 0%t   100%f   0%u

But instead of counting string we are now counting words as indicated by the term ‘source words’

Bugs

  • There are some miscounts related to word breaks.

  • When using the short output formats the columns may not be exactly aligned. This is because the number of digits in different columns is unknown before all input files are processed. The chosen tradeoff here was instanteous output (after each processed file) instead of waiting for the last file to be processed.