歧义处理是影响分词系统切分精度的重要因素,是自动分词系统设计中的一个最困难也是最核心的问题。
Ambiguity processing is an important factor to determine the precise of a word segmenting system, and a most difficult and essential problem of automated word segmenting system.
汉语不同于英语,词之间没有间隔标记。而汉语分词是文本分析的第一步,且存在歧义切分,因此分词问题成为汉语分析的首要难题。
Different from English, there are no interval marks between words in Chinese, so it is difficult for word segmentation to identify ambiguous words.
目前学术界主要采用计算机自动分词解决中文文本分词,但是这种方法不能完全解决分词问题,这是因为计算机自动分词不能彻底地解决歧义字段的切分。
And now the most widely used method is automatic segmentation. But this method can't solve the problem thoroughly, because this method can't solve the problem of ambiguous segment.
应用推荐