跨一组相关文档执行文本分析可以导致更高质量的分类,因为您可以交叉引用更大的语料库,并分析出文档之间更深层的关系。
Performing textual analysis across a set of related documents can result in higher-quality categorization, as you can cross-reference from a larger corpus and glean deeper relations between documents.
文档向量化的质量对于文本分类的速度和准确度有着很大的影响。
The vectorization of documents affects the speed and accuracy of text categorization greatly.
文本分类是对信息检索中的文档集进行组织的一项关键技术。
Text classification is a key technique for organizing document set in IR.
电子文档的飞速增长为自动文本分类提供了巨大的机遇和挑战。
The rapid growth in the amount of electronic documents brings both great opportunities and real challenges for automatic text classification.
文本分类,是一种对文档进行自动标记类别的技术。
Text Categorization(TC) is a technique of assigning a document into predefined class.
针对文本分类中信息增益降维方法的不足,提出了一种基于相对文档频的平衡信息增益(RDFBIG)降维方法。
To overcome the shortage of information gain in text categorization, this paper proposes a method of feature reduction based on the relative document frequency balance information gain (RDFBIG).
利用训练文档的类信息对文本分类模型进行建模,提取对分类贡献较大的特征。
Use the class information of training set to build the model, and extract the feature benefit to classification.
本文通过在文本分类系统中应用反馈方法,大大地减少了系统在训练过程中对训练文档数量的要求。
This paper employs Feedback methods for Text Categorization systems and reduces the need for labeled training documents.
采用少量已标记和大量未标记文档进行文本分类已成为一种重要研究趋势。
The problem of combining a small set of labeled data with a large pool of unlabeled data for text classification task has been extensively studied.
随着在线文本文档数量的快速增长,文本分类已经成为处理和组织文本数据的一种关键技术。
With the rapid growth of online digital text data, text categorization has become one of the key techniques for handling and organizing text data.
本文通过分析常用的平文本分类方法在直接应用到富文本文档时表现较差的原因,提出了富文本分类建模时应该考虑的因素,并将其归纳为七个方面。
The thesis, by analyzing the reasons, proposes the factors that should be taken in account in stage of rich format text classification modeling, and then groups them into seven aspects.
本文通过分析常用的平文本分类方法在直接应用到富文本文档时表现较差的原因,提出了富文本分类建模时应该考虑的因素,并将其归纳为七个方面。
The thesis, by analyzing the reasons, proposes the factors that should be taken in account in stage of rich format text classification modeling, and then groups them into seven aspects.
应用推荐