论坛的结构化数据抽取是对论坛中帖子的标题、作者、发表时间和内容文本块等论坛元数据的抽取,它是处理论坛数据的基础。
Forum structured data extraction is the meta-data extraction from web forums such as post title, post author, post time and post content. It is the foundation of processing forum data.
使用XML从半结构化数据集中抽取有用信息。
Using XML to extract useful information from semi-structured data sets.
半结构化数据是网络中一种重要的数据形式,其数据抽取和知识发现研究是半结构化数据各项研究的核心。
Semi-instructured data is a kind of the important type in networks, and its data extracting and knowledge discovery is the core for semi-structured researches.
模式抽取在半结构化数据研究领域中具有重要意义。
Extracting schema is important in the field of semistructured data research.
该算法在分类时,通过对抽取到的数据集超文本文档中的超文本信息进行加权处理,更好的综合协调地利用了超文本的多元结构化信息。
To better use the multiple and structured information of hypertext, the information in the data sets hypertext documents after extraction need weighting during algorithm classification.
算法不依赖特定的模板,因此可以适应论坛模板的周期性变化,自动抽取结构化数据。
It does not depend on specific template, thus is able to adapt to periodical changes of forum template and extract structured data automatically.
然后,这些有意义的数据或事实可以被抽取出来,与现有的结构化信息融合,成为能够搜索、总结并统计分析的关系表。
Then, meaningful data can be extracted and fused with existing structured information into relational tables that can be searched, summed, counted and otherwise statistically analyzed.
其次,本文采用中文信息抽取技术抽取非结构化数据包含的实体相关信息。
Secondly, in this paper, we will research the Chinese information extraction technology to extract the entities from the unstructured data.
然而由于网页布局设计的复杂性和用户发表帖子的灵活性,从论坛网页中抽取结构化的数据是一项未能很好解决并非常具有挑战性的任务。
Because of both complex page layout designs and unrestricted user created posts, extracting structured data from web forum pages is a very challenging task and not solved well.
为了有效利用论坛数据,大部分应用首先从论坛网页中抽取结构化的数据,再进一步利用这些数据实现各种功能。
In order to use the forum data effectively, the fundamental step in most applications is to extract structured data from forum pages, then further exploit forum data to achieve various functions.
信息提取系统是从一段文本中抽取指定的一类信息并将其形成结构化的数据供用户使用的过程。
Information extraction is a system that can extract specific information from lots texts and can transform that information into structure data user can use that on various purposes.
由于网页布局设计的复杂性和用户发表帖子的灵活性,从论坛网页中抽取结构化的数据是一项未能很好解决并非常具有挑战性的任务。
Because of both complex page layout designs and unrestricted user created posts, extracting structured data from Web forum pages is a very challenging task and not easily solved.
Oracle9i增加了互联网查找,从丰富内容中抽取和索引元数据的强大工具,以及查找XML数据和编目结构化数据的能力。
Oracle9i adds Internet search, powerful facilities to extract and index metadata from rich content, and the ability to search XML and catalog structures.
Oracle9i增加了互联网查找,从丰富内容中抽取和索引元数据的强大工具,以及查找XML数据和编目结构化数据的能力。
Oracle9i adds Internet search, powerful facilities to extract and index metadata from rich content, and the ability to search XML and catalog structures.
应用推荐