本文提出一种词性标注规则自动学习算法。
A learning algorithm is presented to acquire and optimize rules which are applied on part of speech tagging.
汉语的分词及词性标注是汉语语言处理的基础。
The Chinese words segmentation and labeling are basis of the Chinese language processing.
而有关蒙古文自动词性标注方面的研究还欠缺。
But, the relevant Mongolian automatic part-of-speech tagging studies still lack.
最后实例说明汉语兼类词词性标注规则的获取方法。
Finally, a example is used to illustrate the acquiring method of ambiguity word POS tagging rules.
本文首先探索了基于单语料库的无监督中文词性标注。
This paper first explores unsupervised part-of-speech tagging for Chinese via monolingual corpus.
词性标注一直是汉语语文辞书编纂未能很好解决的难题。
The part-of-speech label is always a difficult problem which writers of Chinese dictionaries can not solve perfectly.
文中构建了一个基于最大熵结合中文词语聚类的词性标注器。
The paper built a POS tagger based on MaxEnt and word clustering.
得到模型参数之后采用VITERBI算法进行自动词性标注。
The model parameters to be used VITERBI automatic part-of-speech tagging algorithm.
文章介绍了一种基于搭配模式的汉语词性标注规则的获取方法。
In this paper, a kind of acquisitive method for Chinese POS tagging rules based on collocation mode, is introduced.
测试评价标准分别采用了词性标注准确率和兼类词排歧准确率。
Test evaluation criteria were used in POS tagging accuracy and part-category words disambiguation accuracy.
概率参数的获取是基于统计的词性标注的两个主要研究方向之一。
Probability parameter obtaining is one of the two main study directions of part of speech tagging based on statistics.
本文提出了将三阶隐马尔可夫模型运用到维吾尔语词性标注中的方法。
This paper describes a method of Uigur part-of-speech tagging with third-order Hidden Markov Model.
实现了基于本文词典的文书处理的分词、词性标注和词汇功能描述等。
Segmentation, tagging, and function description of glossary of Operation Document processing were implemented.
将这三个任务用两个模块来实现:分词和词性标注模块、句法分析模块。
We realize the three tasks with two pieces of module: the word segment and part of speech tagging module, the syntax analysis module.
分析、设计和实现了一个基于条件随机场模型的汉语分词和词性标注模块。
We analyzed, designed and achieved a module of Chinese word segmentation and Part-Of-Speech Tagging based on Condition Random Fields model.
由于不涉及本体库的知识,且语料库仅需分词和词性标注,适合应用于汉语。
Just a segmentation and POS labelling corpora is needed. Both indicate that the method is appropriate for Chinese.
该模型是一个词汇化的句法分析模型,能结合分词、词性标注进行句法分析;
The model is a parser based on lexicalized model, it is combined with segmentation and POS tagging model and thus a language parser is built.
目前关于汉语自动词性标注方面一些人士做了许多相关研究,并取得了一定的成果。
Currently, a lot of related research on the aspects of Chinese Automatic Speech Tagging has been done by researchers, and some significant results have been achieved.
系统包括初切分,词性标注、歧义字段处理、模型平滑、未登录词识别等功能模块。
The system includes some modules such as originally segmenting, POS tagging, ambiguity processing, model smoothing and Unknown Word Recognizing.
本文对词性标注的方法进行了研究,分析了基于规则的方法和基于统计的方法的优缺点。
In this paper, we study the method of the Chinese Part-of-Speech tagging and analyze the rule method and the statistic method.
兼类词的词类排歧是汉语语料词性标注中的难点问题,它严重影响语料的词性标注质量。
The disambiguation of multi-category words is one of the difficulties in part-of-speech tagging of Chinese text, which affects the processing quality of corpora greatly.
论文引入条件随机域建立词性标注模型,易于融合新的特征,并能解决标注偏置的问题。
So Conditional Random Field (CRF) is introduced to build POS tagging model in this paper, in order to overcome above problems.
最后本文对该系统对蒙古文进行切分之前和切分之后的自动词性标注分别作了以下的实验。
Finally, using the system to the Mongolian automatic POS tagging when Mongolian segmentation before and after is made by the following test.
首先是建立法律语料库,主要是法律书面语的语料库,并对其进行机器自动分词和词性标注。
It is mainly a corpus base of the legal written language to set up legal corpus base at first, and mark automatic participle of the machine and morphological feature to it.
在对大规模语料库进行深加工时,保证词性标注的一致性已成为建设高质量语料库的首要问题。
In the deep processing of large-scale corpus, it has been a chief problem to assure the consistence of part of speech tagging to build the high-quantity corpus.
并且,针对词性标注容易引起语义缺失的问题,提出了以语义标注作为医案信息抽取规则的方案。
Furthermore, for part of speech tagging usually causes semantic loses, semantic tagging is suggested to be the method for medical information extraction.
针对新信息检测的英文浅层语言分析主要包括断句、词汇切分、词性标注以及词形还原等自然语言处理过程。
Shallow English language parsing customized for novelty detection includes sentence boundary detection, tokenization, part-of-speech tagging, and morphological analysis.
针对新信息检测的英文浅层语言分析主要包括断句、词汇切分、词性标注以及词形还原等自然语言处理过程。
Shallow English language parsing customized for novelty detection includes sentence boundary detection, tokenization, part-of-speech tagging, and morphological analysis.
应用推荐