本文使用标准的XML技术来解决网页信息抽取问题。
We apply standard technologies of XML to web information extraction problem.
将网页信息抽取知识分为若干层,由抽象到具体逐层描述信息识别模式知识。
The knowledge used in this method (called HPIE) is composed of a few kinds of pattern descriptions, from abstract to concrete for information recognition patterns.
提出一种新的网页信息抽取方法,基于子树的广度可不加区分地对不同科技文献网站的页面信息进行自动抽取。
This paper proposes a new method which can extract the useful information from the different document sites automatically based on the breadth of a sub-tree.
应用推荐