该算法选取源搜索结果中排名靠前的部分网页,对这部分网页根据网页相似度进行DBSCAN聚类,最大限度剔除冗余网页,实现搜索结果的优化。
The algorithm selected top-ranking Web pages of source search results, and clustered them to remove as much redundant pages as possible according to page similarity to achieve optimal search results.
本文依据冗余网页的特点引入模糊匹配的思想,利用网页文本的内容、结构信息,提出了基于特征串的中文网页的快速去重算法,同时对算法进行了优化处理。
The idea of fuzzy matching and information of content and structure of the text of web page are introduced into the algorithm, and the efficiency of the algorithm is optimized.
但另一方面:使用框架会增加学习成本并且会产生多余的样式和标记代码,最终导致网页代码冗余。
However, take note: Using frameworks involves a learning curve and can bulk up your web page sizes with unnecessary style rules and markup.
移动有上万网页、多种服务,更为复杂的服务器和冗余因素的网站当然更加复杂一些。
Moving a huge site with tens of thousands of pages, multiple servers and more complex hosting and redundancy factors is of course, a bit more complex.
两次烹饪法显得冗余,并且使你的网页不标准,还没有提供一种显示替换内容的机制。
The twice-cooked method is redundant, makes your web pages invalid, and doesn't include a mechanism for inserting alternative content.
网页检索结果中,用户经常会得到内容相同的冗余页面。
In the homepage retrieval result, users often get the redundant page with same content.
网页检索结果中,用户经常会得到内容相同的冗余页面。
In the homepage retrieval result, the user can obtain the content same redundant page frequently.
网页检索结果中,用户经常会得到内容相同的冗余页面。
In the homepage retrieval result, the user can obtain the content same redundant page frequently.
应用推荐