Secondly, the system can distinguish the domain of the web page and understand the document at the concept level by text classification, clustering and concept extraction based machine learning.
其次,采用机器学习技术,包括文本分类、聚类,文本概念抽取,从概念层次理解文本信息;
Document clustering had been employed in information filtering, web page classification and so on.
文本聚类在信息过滤,网页分类中有着很好的应用。但它面临数据量大,特征维度高的难点。
应用推荐