search

Services for searching and matching of text.

lshtein

A class to calculate a similarity based on the Levenshtein distance.

If available, the python-Levenshtein will be used which will provide better performance as it is implemented natively.

translate.search.lshtein.distance(a, b, stopvalue=0)

Same as python_distance in functionality. This uses the fast C version if we detected it earlier.

Note that this does not support arbitrary sequence types, but only string types.

translate.search.lshtein.native_distance(a, b, stopvalue=0)

Same as python_distance in functionality. This uses the fast C version if we detected it earlier.

Note that this does not support arbitrary sequence types, but only string types.

translate.search.lshtein.python_distance(a, b, stopvalue=-1)

Calculates the distance for use in similarity calculation. Python version.

match

Class to perform translation memory matching from a store of translation units.

class translate.search.match.matcher(store, max_candidates=10, min_similarity=75, max_length=70, comparer=None, usefuzzy=False)

A class that will do matching and store configuration for the matching process.

buildunits(candidates)

Builds a list of units conforming to base API, with the score in the comment.

extendtm(units, store=None, sort=True)

Extends the memory with extra unit(s).

Parameters:
  • units – The units to add to the TM.

  • store – Optional store from where some metadata can be retrieved and associated with each unit.

  • sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in matcher.inittm().

static getstartlength(min_similarity, text)

Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.

getstoplength(min_similarity, text)

Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.

inittm(stores, reverse=False)

Initialises the memory for later use. We use simple base units for speedup.

matches(text)

Returns a list of possible matches for given source text.

Parameters:

text (String) – The text that will be search for in the translation memory

Return type:

list

Returns:

a list of units with the source and target strings from the translation memory. If self.addpercentage is True (default) the match quality is given as a percentage in the notes.

setparameters(max_candidates=10, min_similarity=75, max_length=70)

Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored.

usable(unit)

Returns whether this translation unit is usable for TM.

translate.search.match.sourcelen(unit)

Returns the length of the source string.

class translate.search.match.terminologymatcher(store, max_candidates=10, min_similarity=75, max_length=500, comparer=None)

A matcher with settings specifically for terminology matching.

buildunits(candidates)

Builds a list of units conforming to base API, with the score in the comment.

extendtm(units, store=None, sort=True)

Extends the memory with extra unit(s).

Parameters:
  • units – The units to add to the TM.

  • store – Optional store from where some metadata can be retrieved and associated with each unit.

  • sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in matcher.inittm().

getstartlength(min_similarity, text)

Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.

getstoplength(min_similarity, text)

Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.

inittm(store)

Normal initialisation, but convert all source strings to lower case.

matches(text)

Normal matching after converting text to lower case. Then replace with the original unit to retain comments, etc.

setparameters(max_candidates=10, min_similarity=75, max_length=70)

Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored.

usable(unit)

Returns whether this translation unit is usable for terminology.

translate.search.match.unit2dict(unit)

Converts a pounit to a simple dict structure for use over the web.

terminology

A class that does terminology matching.