实行中文分词连写确实具有必要性。
中文分词是校园网搜索引擎项目的一个核心技术。
Word segment was a core technology of campus search engine project.
摘要词典是许多中文分词系统的一个重要的组成部分。
Abstract the dictionary mechanism serves as one of the important components in a lot of Chinese word segmentation systems.
分析现有几种中文分词方法,提出一种关键词抽取算法。
This paper analyzes several existing Chinese word segmentation methods, brings out a keywords extraction algorithm which according to the weight formula.
论文介绍了一个基于词频统计的中文分词系统的设计和实现。
The paper introduces the design and implementation of Chinese word segmentation system, which is based on statistic the frequency of the word.
所以,要使计算机能够处理中文文本,就必须先进行中文分词。
So, to make the computer capable of handling Chinese text, text must do Chinese word segmentation first.
迅雷资源搜索引擎索引器的实现,主要是如何建立中文分词和倒排表。
The implement of XunLei indexer, it mainly include how to create Chinese participle and reverse table.
索引模块中:首先,讨论了中文分词的设计思想,选择了分词的算法。
Index module: first of all, discuss the design method of Chinese word segmentation and choose a word segmentation algorithm.
预处理方面,本文分为两个步骤:科技论文文本数据预处理和中文分词处理。
In the aspect of the pretreatment, this paper contains two approaches: the one is basic text data pretreatment, and the other is Chinese participle.
搜索引擎的技术涉及到自然语言理解、中文分词、人工智能、机器学习等学科。
Search engine technology related to natural language understanding, Chinese word segmentation, artificial intelligence, machine learning and so on.
本文将阅卷过程分解为三个主要步骤来进行:中文分词、句法分析和相似度计算。
The process of the intelligent scoring is divided into three main steps: Chinese word segmentation, syntactic analysis and similarity computation.
而主题提取是以中文分词作为第一步,分词质量直接影响到文献主题提取的质量。
Chinese word segmentation is always the first step of subject extraction. The quality of word segmentation is effective to the quality of text subject extraction.
2003年在日本札幌举行了第一届ACL -SIGHAN国际中文分词竞赛。
The ACL-SIGHAN sponsored the First International Chinese Word Segmentation in July, 2003 in Japan.
词典是中文自动分词的基础,分词词典机制的优劣直接影响到中文分词的速度和效率。
As a basic component of Chinese word segmentation system, the dictionary mechanism influences the speed and the efficiency of segmentation significantly.
基于前缀树和动态规划,该算法提高了中文分词速度,同时保持了相对较高的分词准确性。
Using prefix tree and dynamic programming, this algorithm boosts the speed of Chinese word segmentation and guarantees relatively high precision.
针对邮件文本分词效果较差的特点,提出采用一种改进的最大匹配法来进行中文分词的方法。
Aiming at the dissatisfied effect of Chinese word segmentation to Email texts, an improved Maximum Match Based Approach is presented.
因此中文信息处理的首要问题,就是要将句子中一个个词给分离出来,这就是中文分词问题。
Therefore, the primary issue of Chinese information processing, that is, to a sentence to separate words, this is the Chinese word segmentation problem.
本文分析了中文分词、文本预处理和压缩、搜索引擎的原理、工作流程、查询处理流程等技术。
This thesis emphatically analyzed the content about Chinese participle, text pretreatment and compress, the principle of search engine, workflow, search handle process etc.
文中论述了在开发中文信息检索系统中所涉及到的两项关键技术,即中文分词技术和检索技术。
Two key techniques in the development of Chinese Information Retrieval System are discussed in this paper, i. e., Chinese word segmentation and search technique.
然而中文分词不是为统计机器翻译而开发的技术,它的分词结果不能保证对统计机器翻译的优化。
However, CWS is not developed for SMT and hence its results are not necessarily optimal for SMT.
基于理解的分词方法研究尚未成熟,所以,绝大部分中文分词系统是应用机械统计相结合的方法。
Because the last direction is not mature, most systems adopt the strategy which contains dictionary and statistics.
计算机可以很容易地理解英文单词,而对由词组成的中文句子,必须通过中文分词技术才得以理解。
The computer may very easily understand English word, but Chinese sentence which is composed by the word, which can be understood through Chinese participle technology.
最初,它是以开源项目Luence为应用主体的,结合词典分词和文法分析算法的中文分词组件。
Initially, it is based on the application of the main open source project Luence, the combination of sub-word dictionary and grammar of Chinese word segmentation algorithm components.
文章首先构造了自动答疑系统架构,改进了中文分词算法,并利用领域本体库和语句相似度设计了该系统。
In this paper, we first construct the system architecture, improve the Chinese text segmentation algorithm, then, by making use of domain ontology base and sentence similarity, design the system.
本文首次使用SVM方法来完成中文分词的任务,使用上下文窗体属性和基于规则的属性对样本进行刻画。
Here we explore SVM for a Chinese word segmentation task, use the context attributes and rule-based attributes as the features for a sample.
基于统计的中文分词按照分类单位划分,通常可分为基于汉字标注的分词和基于全切分图的分词两种方法。
There are two kinds of statistical word segmentation, one is by character labeling and the other is based Omni-segmentation.
中文分词是搜索引擎中比较重要的部分,本文分析了正向和逆向的最大匹配分词以及基于统计的分词方法。
The design and implementation of"Chinese Word Segmentation"is the most important part of the search engine technology.
在现阶段还没有一种很好的方法来解决中文分词的问题,本文将会提出一种面向用户兴趣建模的中文分词方法。
Now there isn't a good enough method to solve the problem of character compartmentation. In this paper, a method to solve the problem of character compartmentation is proposed.
在现阶段还没有一种很好的方法来解决中文分词的问题,本文将会提出一种面向用户兴趣建模的中文分词方法。
Now there isn't a good enough method to solve the problem of character compartmentation. In this paper, a method to solve the problem of character compartmentation is proposed.
应用推荐