This class is the common parent class for all language classes.
|
|
punctranslate(cls,
text)
Converts the punctuation in a string according to the rules of the
language. |
source code
|
|
|
|
character_iter(cls,
text)
Returns an iterator over the characters in text. |
source code
|
|
|
|
characters(cls,
text)
Returns a list of characters in text. |
source code
|
|
|
|
word_iter(cls,
text)
Returns an iterator over the words in text. |
source code
|
|
|
|
words(cls,
text)
Returns a list of words in text. |
source code
|
|
|
|
sentence_iter(cls,
text,
strip=True)
Returns an iterator over the sentences in text. |
source code
|
|
|
|
sentences(cls,
text,
strip=True)
Returns a list of senteces in text. |
source code
|
|
|
|
capsstart(cls,
text)
Determines whether the text starts with a capital letter. |
source code
|
|
|
|
code = ''
The ISO 639 language code, possibly with a country specifier or other
modifier.
|
|
|
fullname = ''
The full (English) name of this language.
|
|
|
nplurals = 0
The number of plural forms of this language.
|
|
|
pluralequation = '0'
The plural equation for selection of plural forms.
|
|
|
listseperator = u', '
This string is used to seperate lists of textual elements.
|
|
|
commonpunc = u'.,;:!?-@#$%^*_()[]{}/\'`"<>'
These punctuation marks are common in English and most languages that
use latin script.
|
|
|
quotes = u'‘’‛“”„‟′″‴‵‶‷‹›«»'
These are different quotation marks used by various languages.
|
|
|
invertedpunc = u'¿¡'
Inveted punctuation sometimes used at the beginning of sentences in
Spanish, Asturian, Galician, and Catalan.
|
|
|
rtlpunc = u'،؟؛÷'
These punctuation marks are used by Arabic and Persian, for example.
|
|
|
CJKpunc = u'。、,;!?「」『』【】'
These punctuation marks are used in certain circumstances with CJK
languages.
|
|
|
indicpunc = u'।॥॰'
These punctuation marks are used by several Indic languages.
|
|
|
ethiopicpunc = u'።፤፣'
These punctuation marks are used by several Ethiopic languages.
|
|
|
miscpunc = u'…±°¹²³·©®×£¥€'
The middle dot (·) is used by Greek and Georgian.
|
|
|
punctuation = u'.,;:!?-@#$%^*_()[]{}/\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡...
We include many types of punctuation here, simply since this is only
meant to determine if something is punctuation.
|
|
|
sentenceend = u'.!?…։؟।。!?።'
These marks can indicate a sentence end.
|
|
|
sentencere = re.compile(r'(?sx).*?[\.!\?\u2026\u0589\u061f\u09...
|
|
|
puncdict = {}
A dictionary of punctuation transformation rules that can be used by
punctranslate().
|
|
|
ignoretests = []
List of pofilter tests for this language that must be ignored.
|
|
|
checker = None
A language specific checker (see filters.checks).
|