本文使用标准的XML技术来解决网页信息抽取问题。
We apply standard technologies of XML to web information extraction problem.
将网页信息抽取知识分为若干层,由抽象到具体逐层描述信息识别模式知识。
The knowledge used in this method (called HPIE) is composed of a few kinds of pattern descriptions, from abstract to concrete for information recognition patterns.
提出一种新的网页信息抽取方法,基于子树的广度可不加区分地对不同科技文献网站的页面信息进行自动抽取。
This paper proposes a new method which can extract the useful information from the different document sites automatically based on the breadth of a sub-tree.
最后,本文给出了归纳网页模板和抽取网页信息的实验结果。
At last, the paper presents the result of inducting web template3s and extracting web pages.
从企业网页中抽取与企业相关的信息是商业上的实际需求,与之相关的研究既有挑战,又有理论意义。
To extract the corporation attribute information from the web pages of different corporation websites is a factual business demand and the researching about it is also a challenge to us.
在分析信息抽取过程和商品网页结构的基础上,构建了基于网页DOM树的商品供应信息抽取模型。
Based on the analysis of information extraction process and the structure of product web page, a product information extraction model based on DOM tree is established.
实验结果证明,该方法能不依赖科技文献网页的来源而自动地抽取相关信息,并能保证较高的数据抽取回召率和查准率。
Experimental result shows this method automatically extracts the information ignoring where Web sites the pages come from and has high accuracy in terms …
在实现对网页中文本信息提取的同时,如何再为用户抽取所需的图片资料,是信息检索中一个重要的方面。
While realizing drawing the Chinese version information of the webpage, how to collect the necessary picture materials for users again, it is an important respect in information retrieval.
在实现对网页中文本信息提取的同时,如何再为用户抽取所需的图片资料,是信息检索中一个重要的方面。
While realizing drawing the Chinese version information of the webpage, how to collect the necessary picture materials for users again, it is an important respect in information retrieval.
应用推荐