search¶

Services for searching and matching of text.

lshtein¶

A class to calculate a similarity based on the Levenshtein distance.

Class to perform translation memory matching from a store of translation units.

class translate.search.match.matcher(store, max_candidates=10, min_similarity=75, max_length=70, comparer=None, usefuzzy=False)¶

A class that will do matching and store configuration for the matching process.

buildunits(candidates)¶: Builds a list of units conforming to base API, with the score in the comment.

extendtm(units, store=None, sort=True)¶

Extends the memory with extra unit(s).

Parameters:

units – The units to add to the TM.
store – Optional store from where some metadata can be retrieved and associated with each unit.
sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in matcher.inittm().

static getstartlength(min_similarity, text)¶: Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.

getstoplength(min_similarity, text)¶: Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.

inittm(stores, reverse=False)¶: Initialises the memory for later use. We use simple base units for speedup.

matches(text)¶

Returns a list of possible matches for given source text.

Parameters:: text (String) – The text that will be search for in the translation memory
Return type:: list
Returns:: a list of units with the source and target strings from the translation memory. If self.addpercentage is True (default) the match quality is given as a percentage in the notes.

setparameters(max_candidates=10, min_similarity=75, max_length=70)¶: Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored.

translate.search.match.sourcelen(unit)¶: Returns the length of the source string.

class translate.search.match.terminologymatcher(store, max_candidates=10, min_similarity=75, max_length=500, comparer=None)¶

A matcher with settings specifically for terminology matching.

buildunits(candidates)¶: Builds a list of units conforming to base API, with the score in the comment.

extendtm(units, store=None, sort=True)¶

Extends the memory with extra unit(s).

Parameters:

units – The units to add to the TM.
store – Optional store from where some metadata can be retrieved and associated with each unit.
sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in matcher.inittm().

getstartlength(min_similarity, text)¶: Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.

getstoplength(min_similarity, text)¶: Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.

inittm(store)¶: Normal initialisation, but convert all source strings to lower case.

matches(text)¶: Normal matching after converting text to lower case. Then replace with the original unit to retain comments, etc.

setparameters(max_candidates=10, min_similarity=75, max_length=70)¶: Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored.

usable(unit)¶: Returns whether this translation unit is usable for terminology.

translate.search.match.unit2dict(unit)¶: Converts a pounit to a simple dict structure for use over the web.

A class that does terminology matching.