Then word segment tool is utilized to carry on Chinese word segment and pretreatment to result returned from Meta Search Engine, improved C-Value/NC-Value algorithm is utilized to extract the nested key phrase from what has been returned in the passage.
然后,利用中文分词工具对元搜索引擎搜索返回的结果进行中文分词和预处理,利用改进的C-Value/NC-Value算法从返回的结果片段中抽取嵌套的关键名词短语。
参考来源 - 基于形式概念分析的搜索结果聚类研究·2,447,543篇论文数据,部分数据来源于NoteExpress
Traditional method faces the difficulties that need to handle high dimension vector and Chinese word segment.
传统的中文文本聚类方法需要对高维向量进行处理,有对中文文本需要进行分词处理等困难。
Aiming at processing of Chinese address information, a segment mechanism based on the feature word is proposed.
针对中文地址类信息的处理,建立了包含分词规则的元数据库,提出一种相似重复检测模型。
应用推荐