在实现对网页中文本 信息提取的同时,如何再为用户抽取所需的图片资料,是信息检索中一个重要的方面。
While realizing drawing the Chinese version information of the webpage, how to collect the necessary picture materials for users again, it is an important respect in information retrieval.
针对不同类型的中文网页及网页中含有文字的图片,提出了提取文字信息的几种方法,解决了特殊中文网页中文字信息无法提取的问题。
Because of different types of Chinese Webpage and pictures containing words, this thesis proposes several methods of extracting word information from Chinese Webpage.
应用推荐