分词器会考虑在哪将文本切分成词(典型的如空格)的实际规则。
The tokenizers take care of the actual rules for where to break the text up into words (typically whitespace).
本文给出了为汉语自动分词而提出的机械匹配法、特征词库法、约束矩阵法、语法分析法和理解切分法。
This paper presents methods of mechanical matching, feature lexicon, binding matrix, grammar analysis and semantic understanding for the Chinese language automatic word segmentation.
切分歧义是影响汉语自动分词系统精度的一个重要因素。
Segmentation Ambiguity is an important factor influencing accuracy of Chinese auto-segmentation system.
组合型歧义切分字段一直是汉语自动分词的难点,难点在于消歧依赖其上下文语境信息。
Combinational ambiguity is a challenging issue in Chinese word segmentation in that its disambiguation depends on the contextual information.
本文介绍了目前采用的几种汉语自动分词技术,包括:最大匹配法、改进的最大匹配法、全切分法等。
This paper introduces many technology of segmentation, such as maximum matching, improved maximum matching, full segmentation, and so on.
系统首先对待切分词使用有限状态自动机进行分析。
In this paper, the authors first use FSM to analyze the stemming words.
使用自动分词知识可以进一步提高自动切分精度,满足高标准的需求。
Knowledge of Chinese words automatic segmentation can raise the precision of automatic segmentation, and it can satisfy high precision requirements.
歧义处理是影响分词系统切分精度的重要因素,是自动分词系统设计中的一个最困难也是最核心的问题。
Ambiguity processing is an important factor to determine the precise of a word segmenting system, and a most difficult and essential problem of automated word segmenting system.
切分过程系统利用改进正向最大匹配算法,提高了分词切分效率。
Maximum match method is optimized to improve the speed of the system during the word segmentation.
基于统计的中文分词按照分类单位划分,通常可分为基于汉字标注的分词和基于全切分图的分词两种方法。
There are two kinds of statistical word segmentation, one is by character labeling and the other is based Omni-segmentation.
汉语不同于英语,词之间没有间隔标记。而汉语分词是文本分析的第一步,且存在歧义切分,因此分词问题成为汉语分析的首要难题。
Different from English, there are no interval marks between words in Chinese, so it is difficult for word segmentation to identify ambiguous words.
目前学术界主要采用计算机自动分词解决中文文本分词,但是这种方法不能完全解决分词问题,这是因为计算机自动分词不能彻底地解决歧义字段的切分。
And now the most widely used method is automatic segmentation. But this method can't solve the problem thoroughly, because this method can't solve the problem of ambiguous segment.
并针对全切分分词算法进行了研究,给出了全切分分词方法算法中的并发检索模型。
Furthermore, according to the study on omni-segmentation, a model of parallel searching in word omni-segmentation algorithm is given.
分词典型设计,有正向最大切分法MM与逆向最大切分法RMM!
Sub-dictionary design, has a positive maximum syncopation MM and reverse maximum syncopation RMM!
分词典型设计,有正向最大切分法MM与逆向最大切分法RMM!
Sub-dictionary design, has a positive maximum syncopation MM and reverse maximum syncopation RMM!
应用推荐