分词器会考虑在哪将文本切分成词(典型的如空格)的实际规则。
The tokenizers take care of the actual rules for where to break the text up into words (typically whitespace).
本文给出了为汉语自动分词而提出的机械匹配法、特征词库法、约束矩阵法、语法分析法和理解切分法。
This paper presents methods of mechanical matching, feature lexicon, binding matrix, grammar analysis and semantic understanding for the Chinese language automatic word segmentation.
切分歧义是影响汉语自动分词系统精度的一个重要因素。
Segmentation Ambiguity is an important factor influencing accuracy of Chinese auto-segmentation system.
应用推荐