filters

Filters that can be used on translations…

autocorrect

A set of autocorrect functions that fix common punctuation and space problems automatically.

translate.filters.autocorrect.correct(source, target)

Runs a set of easy and automatic corrections.

Current corrections include:
  • Ellipses - align target to use source form of ellipses (either three dots or the Unicode ellipses characters)

  • Missing whitespace and start or end of the target

  • Missing punction (.:?) at the end of the target

checks

This is a set of validation checks that can be performed on translation units.

Derivatives of UnitChecker (like StandardUnitChecker) check translation units, and derivatives of TranslationChecker (like StandardChecker) check (source, target) translation pairs.

When adding a new test here, please document and explain their behaviour on the pofilter tests page.

class translate.filters.checks.CCLicenseChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.CheckerConfig(targetlanguage=None, accelmarkers=None, varmatches=None, notranslatewords=None, musttranslatewords=None, validchars=None, punctuation=None, endpunctuation=None, ignoretags=None, canchangetags=None, criticaltests=None, credit_sources=None)

Object representing the configuration of a checker.

update(otherconfig)

Combines the info in otherconfig into this config object.

updatetargetlanguage(langcode)

Updates the target language in the config to the given target language and sets its script.

updatevalidchars(validchars)

Updates the map that eliminates valid characters.

class translate.filters.checks.DrupalChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

exception translate.filters.checks.FilterFailure(messages)

This exception signals that a Filter didn’t pass, and gives an explanation or a comment.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class translate.filters.checks.GnomeChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

gconf(str1, str2)

Checks if we have any gconf config settings translated.

Gconf settings should not be translated so this check checks that gconf settings such as “name” or “modification_date” are not translated in the translation. It allows you to change the surrounding quotes but will ensure that the setting values remain untranslated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.IOSChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.KdeChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.L20nChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

For Mozilla we lower the severity to cosmetic, and for some languages it also ensures accelerators are absent in the target string since some languages do not use accelerators, for example Indic languages.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

dialogsizes(str1, str2)

Checks that dialog sizes are not translated.

This is a Mozilla specific test. Mozilla uses a language called XUL to define dialogues and screens. This can make use of CSS to specify properties of the dialogue. These properties include things such as the width and height of the box. The size might need to be changed if the dialogue size changes due to longer translations. Thus translators can change these settings. But you are only meant to change the number not translate the words ‘width’ or ‘height’. This check capture instances where these are translated. It will also catch other types of errors in these units.

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks that numbers are not translated.

Special handling for Mozilla to ignore entries that are dialog sizes.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

Special handling for Mozilla to ignore entries that are dialog sizes.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.LibreOfficeChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Not used in LibreOffice.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

validxml(str1, str2)

Check that all XML/HTML open/close tags has close/open pair in the translation.

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.MinimalChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.MozillaChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

For Mozilla we lower the severity to cosmetic, and for some languages it also ensures accelerators are absent in the target string since some languages do not use accelerators, for example Indic languages.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

dialogsizes(str1, str2)

Checks that dialog sizes are not translated.

This is a Mozilla specific test. Mozilla uses a language called XUL to define dialogues and screens. This can make use of CSS to specify properties of the dialogue. These properties include things such as the width and height of the box. The size might need to be changed if the dialogue size changes due to longer translations. Thus translators can change these settings. But you are only meant to change the number not translate the words ‘width’ or ‘height’. This check capture instances where these are translated. It will also catch other types of errors in these units.

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks that numbers are not translated.

Special handling for Mozilla to ignore entries that are dialog sizes.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

Special handling for Mozilla to ignore entries that are dialog sizes.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.OpenOfficeChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.ReducedChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

exception translate.filters.checks.SeriousFilterFailure(messages)

This exception signals that a Filter didn’t pass, and the bad translation might break an application (so the string will be marked fuzzy).

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class translate.filters.checks.StandardChecker(checkerconfig=None, excludefilters=None, limitfilters=None, errorhandler=None)

The basic test suite for source -> target translations.

accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.StandardUnitChecker(checkerconfig=None, excludefilters=None, limitfilters=None, errorhandler=None)

The standard checks for common checks on translation units.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

hassuggestion(unit)

Checks if there is at least one suggested translation for this unit.

If a message has a suggestion (an alternate translation stored in alt-trans units in XLIFF and .pending files in PO) then these will be extracted. This is used by Pootle and is probably only useful in pofilter when using XLIFF files.

isfuzzy(unit)

Check if the unit has been marked fuzzy.

If a message is marked fuzzy in the PO file then it is extracted. Note this is different from --fuzzy and --nofuzzy options which specify whether tests should be performed against messages marked fuzzy.

isreview(unit)

Check if the unit has been marked review.

If you have made use of the ‘review’ flags in your translations:

# (review) reason for review
# (pofilter) testname: explanation for translator

Then if a message is marked for review in the PO file it will be extracted. Note this is different from --review and --noreview options which specify whether tests should be performed against messages already marked as under review.

nplurals(unit)

Checks for the correct number of noun forms for plural translations.

This uses the plural information in the language module of the Translate Toolkit. This is the same as the Gettext nplural value. It will check that the number of plurals required is the same as the number supplied in your translation.

run_filters(unit, categorised=False)

Run all the tests in this suite.

Return type:

Dictionary

Returns:

Content of the dictionary is as follows:

{'testname': { 'message': message_or_exception, 'category': failure_category } }

static run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

class translate.filters.checks.TeeChecker(checkerconfig=None, excludefilters=None, limitfilters=None, checkerclasses=None, errorhandler=None, languagecode=None)

A Checker that controls multiple checkers.

categories = {}

Categories where each checking function falls into Function names are used as keys, categories are the values

getfilters(excludefilters=None, limitfilters=None)

Returns a dictionary of available filters, including/excluding those in the given lists.

run_filters(unit, categorised=False)

Run all the tests in the checker’s suites.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

class translate.filters.checks.TermChecker(**kwargs)
accelerators(str1, str2)

Checks whether accelerators are consistent between the two strings.

This test is capable of checking the different type of accelerators that are used in different projects, like Mozilla or KDE. The test will pick up accelerators that are missing and ones that shouldn’t be there.

See accelerators on the localization guide for a full description on accelerators.

acronyms(str1, str2)

Checks that acronyms that appear are unchanged.

If an acronym appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged. In that case this test is useful for tracking down translations of the acronym and correcting them.

blank(str1, str2)

Checks whether a translation is totally blank.

This will check to see if a translation has inadvertently been translated as blank i.e. as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as “ “ it will appear to most tools as if it is translated.

brackets(str1, str2)

Checks that the number of brackets in both strings match.

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

compendiumconflicts(str1, str2)

Checks for Gettext compendium conflicts (#-#-#-#-#).

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium is used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

credits(str1, str2)

Checks for messages containing translation credits instead of normal translations.

Some projects have consistent ways of giving credit to translators by having a unit or two where translators can fill in their name and possibly their contact details. This test allows you to find these units easily to check that they are completed correctly and also disables other tests that might incorrectly get triggered for these units (such as urls, emails, etc.)

doublequoting(str1, str2)

Checks whether doublequoting is consistent between the two strings.

Checks on double quotes " to ensure that you have the same number in both the original and the translated string. This tests takes into account that several languages use different quoting characters, and will test for them instead.

doublespacing(str1, str2)

Checks for bad double-spaces by comparing to original.

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords(str1, str2)

Checks for repeated words in the translation.

Words that have been repeated in a translation will be highlighted with this test e.g. “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off.

emails(str1, str2)

Checks that emails are not translated.

Generally you should not be translating email addresses. This check will look to see that email addresses e.g. info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc(str1, str2)

Checks whether punctuation at the end of the strings match.

This will ensure that the ending of your translation has the same punctuation as the original. E.g. if it ends in :[space] then so should yours. It is useful for ensuring that you have ellipses […] in all your translations, not simply three separate full-stops. You may pick up some errors in the original: feel free to keep your translation and notify the programmers. In some languages, characters such as ? or ! are always preceded by a space e.g. [space]? — do what your language customs dictate. Other false positives you will notice are, for example, if through changes in word-order you add “), etc. at the end of the sentence. Do not change these: your language word-order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence, that often these have been done for a reason, e.g. a list where fullstops make it look cluttered. So, initially match them with the English, and make changes once the program is being used.

This check is aware of several language conventions for punctuation characters, such as the custom question marks for Greek and Arabic, Devanagari Danda, full-width punctuation for CJK languages, etc. Support for your language can be added easily if it is not there yet.

endwhitespace(str1, str2)

Checks whether whitespace at the end of the strings matches.

Operates the same as endpunc but is only concerned with whitespace. This filter is particularly useful for those strings which will evidently be followed by another string in the program, e.g. [Password: ] or [Enter your username: ]. The whitespace is an inherent part of the string. This filter makes sure you don’t miss those important but otherwise invisible spaces!

If your language uses full-width punctuation (like Chinese), the visual spacing in the character might be enough without an added extra space.

escapes(str1, str2)

Checks whether escaping is consistent between the two strings.

Checks escapes such as \\n \u0000 to ensure that if they exist in the original string you also have them in the translation.

filepaths(str1, str2)

Checks that file paths have not been translated.

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file path, unless it is being used as an example, e.g. your_user_name/path/to/filename.conf.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

functions(str1, str2)

Checks that function names are not translated.

Checks that function names e.g. rgb() or getEntity.Name() are not translated.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

kdecomments(str1, str2)

Checks to ensure that no KDE style comments appear in the translation.

KDE style translator comments appear in PO files as "_: comment\\n". New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long(str1, str2)

Checks whether a translation is much longer than the original string.

This is most useful in the special case where the translation is multiple characters long while the source text is only 1 character long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

musttranslatewords(str1, str2)

Checks that words configured as definitely translatable don’t appear in the translation.

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurrences of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

newlines(str1, str2)

Checks whether newlines are consistent between the two strings.

Counts the number of \\n newlines (and variants such as \\r\\n) and reports and error if they differ.

notranslatewords(str1, str2)

Checks that words configured as untranslatable appear in the translation too.

Many brand names should not be translated, this test allows you to easily make sure that words like: Word, Excel, Impress, Calc, etc. are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers(str1, str2)

Checks whether numbers of various forms are consistent between the two strings.

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

options(str1, str2)

Checks that command line options are not translated.

In messages that contain command line options, such as --help, this test will check that these remain untranslated. These could be translated in the future if programs can create a mechanism to allow this, but currently they are not translated. If the options has a parameter, e.g. --file=FILE, then the test will check that the parameter has been translated.

printf(str1, str2)

Checks whether printf format strings match.

If the printf formatting variables are not identical, then this will indicate an error. Printf statements are used by programs to format output in a human readable form (they are placeholders for variable data). They allow you to specify lengths of string variables, string padding, number padding, precision, etc. Generally they will look like this: %d, %5.2f, %100s, etc. The test can also manage variables-reordering using the %1$s syntax. The variables’ type and details following data are tested to ensure that they are strictly identical, but they may be reordered.

puncspacing(str1, str2)

Checks for bad spacing after punctuation.

In the case of [full-stop][space] in the original, this test checks that your translation does not remove the space. It checks also for [comma], [colon], etc.

Some languages don’t use spaces after common punctuation marks, especially where full-width punctuation marks are used. This check will take that into account.

purepunc(str1, str2)

Checks that strings that are purely punctuation are not changed.

This extracts strings like + or - as these usually should not be changed.

pythonbraceformat(str1, str2)

Checks whether python brace format strings match.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

sentencecount(str1, str2)

Checks that the number of sentences in both strings match.

Adds the number of sentences to see that the sentence count is the same between the original and translated string. You may not always want to use this test, if you find you often need to reformat your translation, because the original is badly-expressed, or because the structure of your language works better that way. Do what works best for your language: it’s the meaning of the original you want to convey, not the exact way it was written in the English.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

short(str1, str2)

Checks whether a translation is much shorter than the original string.

This is most useful in the special case where the translation is 1 characters long while the source text is multiple characters long. Otherwise, we use a general ratio that will catch very big differences but is set conservatively to limit the number of false positives.

simplecaps(str1, str2)

Checks the capitalisation of two strings isn’t wildly different.

This will pick up many false positives, so don’t be a slave to it. It is useful for identifying translations that don’t start with a capital letter (upper-case letter) when they should, or those that do when they shouldn’t. It will also highlight sentences that have extra capitals; depending on the capitalisation convention of your language, you might want to change these to Title Case, or change them all to normal sentence case.

simpleplurals(str1, str2)

Checks for English style plural(s) for you to review.

This test will extract any message that contains words with a final “(s)” in the source text. You can then inspect the message, to check that the correct plural form has been used for your language. In some languages, plurals are made by adding text at the beginning of words, making the English style messy. In this case, they often revert to the plural form. This test allows an editor to check that the plurals used are correct. Be aware that this test may create a number of false positives.

For languages with no plural forms (only one noun form) this test will simply test that nothing like “(s)” was used in the translation.

singlequoting(str1, str2)

Checks whether singlequoting is consistent between the two strings.

The same as doublequoting but checks for the ' character. Because this is used in contractions like it’s and in possessive forms like user’s, this test can output spurious errors if your language doesn’t use such forms. If a quote appears at the end of a sentence in the translation, i.e. '., this might not be detected properly by the check.

spellcheck(str1, str2)

Checks words that don’t pass a spell check.

This test will check for misspelled words in your translation. The test first checks for misspelled words in the original (usually English) text, and adds those to an exclusion list. The advantage of this exclusion is that many words that are specific to the application will not raise errors e.g. program names, brand names, function names.

The checker works with PyEnchant. You need to have PyEnchant installed as well as a dictionary for your language (for example, one of the Hunspell or aspell dictionaries). This test will only work if you have specified the --language option.

The pofilter error that is created, lists the misspelled word, plus suggestions returned from the spell checker. That makes it easy for you to identify the word and select a replacement.

startcaps(str1, str2)

Checks that the message starts with the correct capitalisation.

After stripping whitespace and common punctuation characters, it then checks to see that the first remaining character is correctly capitalised. So, if the sentence starts with an upper-case letter, and the translation does not, an error is produced.

This check is entirely disabled for many languages that don’t make a distinction between upper and lower case. Contact us if this is not yet disabled for your language.

startpunc(str1, str2)

Checks whether punctuation at the beginning of the strings match.

Operates as endpunc but you will probably see fewer errors.

startwhitespace(str1, str2)

Checks whether whitespace at the beginning of the strings matches.

As in endwhitespace but you will see fewer errors.

tabs(str1, str2)

Checks whether tabs are consistent between the two strings.

Counts the number of \\t tab markers and reports an error if they differ.

unchanged(str1, str2)

Checks whether a translation is basically identical to the original string.

This checks to see if the translation isn’t just a copy of the English original. Sometimes, this is what you want, but other times you will detect words that should have been translated.

untranslated(str1, str2)

Checks whether a string has been translated at all.

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls(str1, str2)

Checks that URLs are not translated.

This checks only basic URLs (http, ftp, mailto etc.) not all URIs (e.g. afp, smb, file). Generally, you don’t want to translate URLs, unless they are example URLs (http://your_server.com/filename.html). If the URL is for configuration information, then you need to query the developers about placing configuration information in PO files. It shouldn’t really be there, unless it is very clearly marked: such information should go into a configuration file.

validchars(str1, str2)

Checks that only characters specified as valid appear in the translation.

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value (e.g. 002B).

variables(str1, str2)

Checks whether variables of various forms are consistent between the two strings.

This checks to make sure that variables that appear in the original also appear in the translation. It can handle variables from projects like KDE or OpenOffice. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags(str1, str2)

Checks that XML/HTML tags have not been translated.

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string e.g. <Error> but will produce false positives for things like An <Error> occurred as here Error should be translated. It also will allow translation of the alt attribute in e.g. <img src="bob.png" alt="Image description"> or similar translatable attributes in OpenOffice.org help files.

class translate.filters.checks.TranslationChecker(checkerconfig=None, excludefilters=None, limitfilters=None, errorhandler=None)

A checker that passes source and target strings to the checks, not the whole unit.

This provides some speedup and simplifies testing.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

run_filters(unit, categorised=False)

Do some optimisation by caching some data of the unit for the benefit of run_test().

run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

class translate.filters.checks.UnitChecker(checkerconfig=None, excludefilters=None, limitfilters=None, errorhandler=None)

Parent Checker class which does the checking based on functions available in derived classes.

categories

Categories where each checking function falls into Function names are used as keys, categories are the values

property checker_name

Extract checker name, for example ‘mozilla’ from MozillaChecker.

filteraccelerators_by_list(str1, acceptlist=None)

Filter out accelerators from str1.

get_ignored_filters()

Return checker’s additional filters for current language.

getfilters(excludefilters=None, limitfilters=None)

Returns dictionary of available filters, including/excluding those in the given lists.

run_filters(unit, categorised=False)

Run all the tests in this suite.

Return type:

Dictionary

Returns:

Content of the dictionary is as follows:

{'testname': { 'message': message_or_exception, 'category': failure_category } }

static run_test(test, unit)

Runs the given test on the given unit.

Note that this can raise a FilterFailure as part of normal operation.

setconfig(config)

Sets the accelerator list.

setsuggestionstore(store)

Sets the filename that a checker should use for evaluating suggestions.

translate.filters.checks.batchruntests(pairs)

Runs test on a batch of string pairs.

translate.filters.checks.intuplelist(pair, list)

Tests to see if pair == (a,b,c) is in list, but handles None entries in list as wildcards (only allowed in positions “a” and “c”). We take a shortcut by only considering “c” if “b” has already matched.

translate.filters.checks.runtests(str1, str2, ignorelist=())

Verifies that the tests pass for a pair of strings.

translate.filters.checks.tagname(string)

Returns the name of the XML/HTML tag in string.

translate.filters.checks.tagproperties(strings, ignore)

Returns all the properties in the XML/HTML tag string as (tagname, propertyname, propertyvalue), but ignore those combinations specified in ignore.

decoration

functions to get decorative/informative text out of strings…

translate.filters.decoration.countaccelerators(accelmarker, acceptlist=None)

returns a function that counts the number of accelerators marked with the given marker.

translate.filters.decoration.findaccelerators(str1, accelmarker, acceptlist=None)

returns all the accelerators and locations in str1 marked with a given marker.

translate.filters.decoration.findmarkedvariables(str1, startmarker, endmarker, ignorelist=[])

returns all the variables and locations in str1 marked with a given marker.

translate.filters.decoration.getaccelerators(accelmarker, acceptlist=None)

returns a function that gets a list of accelerators marked using accelmarker.

translate.filters.decoration.getemails(str1)

Returns the email addresses that are in a string.

translate.filters.decoration.getfunctions(str1)

returns the functions() that are in a string, while ignoring the trailing punctuation in the given parameter.

translate.filters.decoration.getnumbers(str1)

Returns any numbers that are in the string.

translate.filters.decoration.geturls(str1)

Returns the URIs in a string.

translate.filters.decoration.getvariables(startmarker, endmarker)

returns a function that gets a list of variables marked using startmarker and endmarker.

translate.filters.decoration.ispurepunctuation(str1)

Checks whether the string is entirely punctuation.

translate.filters.decoration.isvalidaccelerator(accelerator, acceptlist=None)

returns whether the given accelerator character is valid.

Parameters:
  • accelerator (character) – A character to be checked for accelerator validity

  • acceptlist (String) – A list of characters that are permissible as accelerators

Return type:

Boolean

Returns:

True if the supplied character is an acceptable accelerator

translate.filters.decoration.puncend(str1, punctuation)

Returns all the punctuation from the end of the string.

translate.filters.decoration.puncstart(str1, punctuation)

Returns all the punctuation from the start of the string.

translate.filters.decoration.spaceend(str1)

Returns all the whitespace from the end of the string.

translate.filters.decoration.spacestart(str1)

Returns all the whitespace from the start of the string.

helpers

a set of helper functions for filters…

translate.filters.helpers.countmatch(str1, str2, countstr)

Checks whether countstr occurs the same number of times in str1 and str2.

translate.filters.helpers.countsmatch(str1, str2, countlist)

Checks whether each element in countlist occurs the same number of times in str1 and str2.

translate.filters.helpers.filtercount(str1, func)

Returns the number of characters in str1 that pass func.

translate.filters.helpers.filtertestmethod(testmethod, strfilter)

Returns a version of the testmethod that operates on filtered strings using strfilter.

translate.filters.helpers.funcmatch(str1, str2, func, *args)

Returns whether the result of func is the same for str1 and str2.

translate.filters.helpers.funcsmatch(str1, str2, funclist)

Checks whether the results of each func in funclist match for str1 and str2.

translate.filters.helpers.multifilter(str1, strfilters, *args)

Passes str1 through a list of filters.

translate.filters.helpers.multifiltertestmethod(testmethod, strfilters)

Returns a version of the testmethod that operates on filtered strings using strfilter.

pofilter

Perform quality checks on Gettext PO, XLIFF and TMX localization files.

Snippet files are created whenever a test fails. These can be examined, corrected and merged back into the originals using pomerge.

See: http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/pofilter.html for examples and usage instructions and http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/pofilter_tests.html for full descriptions of all tests.

class translate.filters.pofilter.FilterOptionParser(formats)

A specialized Option Parser for filter tools…

add_option(Option)
add_option(opt_str, ..., kwarg=val, ...) None
build_checkerconfig(options)

Prepare the checker config from the given options. This is mainly factored out for the sake of unit tests.

check_values(values: Values, args: [string])

-> (values : Values, args : [string])

Check that the supplied option values and leftover arguments are valid. Returns the option values and leftover arguments (possibly adjusted, possibly completely new – whatever you like). Default implementation just returns the passed-in values; subclasses may override as desired.

checkoutputsubdir(options, subdir)

Checks to see if subdir under options.output needs to be created, creates if neccessary.

define_option(option)

Defines the given option, replacing an existing one of the same short name if neccessary…

destroy()

Declare that you are done with this OptionParser. This cleans up reference cycles so the OptionParser (and all objects referenced by it) can be garbage-collected promptly. After calling destroy(), the OptionParser is unusable.

disable_interspersed_args()

Set parsing to stop on the first non-option. Use this if you have a command processor which runs another command that has options of its own and you want to make sure these options don’t get confused.

enable_interspersed_args()

Set parsing to not stop on the first non-option, allowing interspersing switches with command arguments. This is the default behavior. See also disable_interspersed_args() and the class documentation description of the attribute allow_interspersed_args.

error(msg: string)

Print a usage message incorporating ‘msg’ to stderr and exit. If you override this in a subclass, it should not return – it should either exit or raise an exception.

finalizetempoutputfile(options, outputfile, fulloutputpath)

Write the temp outputfile to its final destination.

format_manpage()

Returns a formatted manpage.

static getformathelp(formats)

Make a nice help string for describing formats…

static getfullinputpath(options, inputpath)

Gets the full path to an input file.

static getfulloutputpath(options, outputpath)

Gets the full path to an output file.

getfulltemplatepath(options, templatepath)

Gets the full path to a template file.

getoutputname(options, inputname, outputformat)

Gets an output filename based on the input filename.

getoutputoptions(options, inputpath, templatepath)

Works out which output format and processor method to use…

getpassthroughoptions(options)

Get the options required to pass to the filtermethod…

gettemplatename(options, inputname)

Gets an output filename based on the input filename.

static getusageman(option)

Returns the usage string for the given option.

static getusagestring(option)

Returns the usage string for the given option.

static isexcluded(options, inputpath)

Checks if this path has been excluded.

static isrecursive(fileoption, filepurpose='input')

Checks if fileoption is a recursive file.

isvalidinputname(inputname)

Checks if this is a valid input filename.

static mkdir(parent, subdir)

Makes a subdirectory (recursively if neccessary).

static openinputfile(options, fullinputpath)

Opens the input file.

static openoutputfile(options, fulloutputpath)

Opens the output file.

opentemplatefile(options, fulltemplatepath)

Opens the template file (if required).

static opentempoutputfile(options, fulloutputpath)

Opens a temporary output file.

parse_args(args=None, values=None)

Parses the command line options, handling implicit input/output args.

static parse_noinput(option, opt, value, parser, *args, **kwargs)

This sets an option to True, but also sets input to - to prevent an error.

print_help(file: file = stdout)

Print an extended help message, listing all options and any help text provided with them, to ‘file’ (default stdout).

print_manpage(file=None)

Outputs a manpage for the program using the help information.

print_usage(file: file = stdout)

Print the usage message for the current program (self.usage) to ‘file’ (default stdout). Any occurrence of the string “%prog” in self.usage is replaced with the name of the current program (basename of sys.argv[0]). Does nothing if self.usage is empty or not defined.

print_version(file: file = stdout)

Print the version message for this program (self.version) to ‘file’ (default stdout). As with print_usage(), any occurrence of “%prog” in self.version is replaced by the current program’s name. Does nothing if self.version is empty or undefined.

processfile(fileprocessor, options, fullinputpath, fulloutputpath, fulltemplatepath)

Process an individual file.

recurseinputfilelist(options)

Use a list of files, and find a common base directory for them.

recurseinputfiles(options)

Recurse through directories and return files to be processed.

recursiveprocess(options)

Recurse through directories and process files.

run()

Parses the arguments, and runs recursiveprocess with the resulting options.

set_usage(usage=None)

sets the usage string - if usage not given, uses getusagestring for each option.

seterrorleveloptions()

Sets the errorlevel options.

setformats(formats, usetemplates)

Sets the format options using the given format dictionary.

Parameters:

formats (Dictionary or iterable) –

The dictionary keys should be:

  • Single strings (or 1-tuples) containing an input format (if not usetemplates)

  • Tuples containing an input format and template format (if usetemplates)

  • Formats can be None to indicate what to do with standard input

The dictionary values should be tuples of outputformat (string) and processor method.

setmanpageoption()

creates a manpage option that allows the optionparser to generate a manpage.

setprogressoptions()

Sets the progress options.

static splitext(pathname)

Splits pathname into name and ext, and removes the extsep.

Parameters:

pathname (string) – A file path

Returns:

root, ext

Return type:

tuple

splitinputext(inputpath)

Splits an inputpath into name and extension.

splittemplateext(templatepath)

Splits a templatepath into name and extension.

templateexists(options, templatepath)

Returns whether the given template exists…

warning(msg, options=None, exc_info=None)

Print a warning message incorporating ‘msg’ to stderr.

translate.filters.pofilter.runfilter(inputfile, outputfile, templatefile, checkfilter=None)

Reads in inputfile, filters using checkfilter, writes to outputfile.

prefilters

Filters that strings can be passed through before certain tests.

translate.filters.prefilters.filteraccelerators(accelmarker)

Returns a function that filters accelerators marked using accelmarker from a strings.

Parameters:

accelmarker (string) – Accelerator marker character

Return type:

Function

Returns:

fn(str1, acceplist=None)

translate.filters.prefilters.filtervariables(startmarker, endmarker, varfilter)

Returns a function that filters variables marked using startmarker and endmarker from a string.

Parameters:
  • startmarker (string) – Start of variable marker

  • endmarker (string) – End of variable marker

  • varfilter (Function) – fn(variable, startmarker, endmarker)

Return type:

Function

Returns:

fn(str1)

translate.filters.prefilters.filterwordswithpunctuation(str1)

Goes through a list of known words that have punctuation and removes the punctuation from them.

translate.filters.prefilters.removekdecomments(str1)

Remove KDE-style PO comments.

KDE comments start with _:[space] and end with a literal \n. Example:

"_: comment\n"
translate.filters.prefilters.varname(variable, startmarker, endmarker)

Variable filter that returns the variable name without the marking punctuation.

Note

Currently this function simply returns variable unchanged, no matter what *marker’s are set to.

Return type:

String

Returns:

Variable name with the supplied startmarker and endmarker removed.

translate.filters.prefilters.varnone(variable, startmarker, endmarker)

Variable filter that returns an empty string.

Return type:

String

Returns:

Empty string

spelling

An API to provide spell checking for use in checks or elsewhere.