本文研究基于包装器模型的文本信息抽取算法。
This thesis mainly studies relative algorithms on text information extraction based on wrapper model.
提出了一种基于隐马尔可夫模型的文本信息抽取算法。
A new algorithm based on hidden Markov Model is proposed for text information extraction.
针对训练数据来源的多样化,提出了基于多模板隐马尔可夫模型的文本信息抽取算法。
This paper proposes a new algorithm using hidden Markov model for information extraction based on multiple templates due to the variety of training data.
摘要:作为自然语言处理的一个分支,文本信息抽取成为了提取大量文本信息中有用信息的重要手段。
Absrtact: As a branch of natural language processing, the extraction of useful information in large text, the text information extraction became an important means.
文本信息抽取是处理海量文本的重要手段,将隐马尔可夫模型应用到信息抽取领域是一个比较新的研究课题。
Text information extraction is an important method of processing large quantity of text. The application of hidden Markov model to information extraction is a relatively new research topic.
该算法利用文本排版格式、分隔符等信息,对文本进行分块,在分块的基础上结合隐马尔可夫模型进行文本信息抽取。
The algorithm makes use of the information of format and list separators to segment text, and then combines hidden Markov model for text information extraction.
信息抽取是一种用于处理各种类型文本文档的非常有效的方法,然而建立一个文本信息抽取系统却是非常困难和耗费时间的。
Information extraction is an effective way of processing various texts, but building a information extraction system is very difficult and time-consuming.
信息抽取是从自由文本语料库构建数据库,实现情报自动收集的有效途径之一。
Information extraction is a main approach for constructing database from free text corpus and for automatic collecting intelligence information.
与信息检索不同,信息抽取直接从自然语言文本中抽取事实信息。
Unlike information Retrieval, information Extraction Systems extract factual information directly from natural language texts.
信息抽取是指从自然语言文本中识别特定类型的事件或关系的信息,并将这些信息格式化输出。
Information Extraction (ie) is the process of identifying specified types of events or relations in natural language texts, and recording this information in a structured form.
信息抽取的任务是从文本中抽取字符串形式的信息,并将此信息填入带标记的槽中,来表明其含义。
Information extraction refers to the task of extracting information from a text in the form of text strings which are placed into slots labeled to indicate the kind of information that can fill them.
在此基础上,结合用户的个人兴趣,给出了文本特征抽取机制、文本推荐机制、文本与信息需求模型的匹配机制。
Also put forward are the approach for text feature extraction, the pattern of user annotations, and the mechanism for matching texts and profiles.
借助特征聚类进行特征抽取是信息检索领域进行文本特征降维的重要手段之一。
In the domain of information retrieval, using feature clustering to extract the features is one of the most important means in the reduction of text dimension.
基于规则的主要思路是通过分类文本的特征、结构等信息,寻找到一些用于抽取的规则。
The main idea to rules-based model use text documents of the characteristics, structure and so on, to find some rules for extraction.
此方法有效地解决了从非结构化文本中抽取结构化信息的难题。
The method we raise here solves the problem of how to extract structured information from unstructured texts.
其次,采用机器学习技术,包括文本分类、聚类,文本概念抽取,从概念层次理解文本信息;
Secondly, the system can distinguish the domain of the web page and understand the document at the concept level by text classification, clustering and concept extraction based machine learning.
该算法在分类时,通过对抽取到的数据集超文本文档中的超文本信息进行加权处理,更好的综合协调地利用了超文本的多元结构化信息。
To better use the multiple and structured information of hypertext, the information in the data sets hypertext documents after extraction need weighting during algorithm classification.
信息提取系统是从一段文本中抽取指定的一类信息并将其形成结构化的数据供用户使用的过程。
Information extraction is a system that can extract specific information from lots texts and can transform that information into structure data user can use that on various purposes.
在与网络文本相关的信息抽取任务中,观察值序列都是给定的,所以不需要考虑得到观察值的概率,而只需要关注观察值引起的状态转移的概率。
In text-related tasks, the observation sequence is given, so we don't need to care the probability of the observation, but the state sequence induced.
抽取电子邮件和手机短信的多种文本特征,分别在TREC07P电子邮件语料和真实中文手机短信语料上进行了垃圾信息过滤实验。
Through multiple text features extraction from email and short message service (SMS) document, some spam filtering experiments are run on TREC07P email corpus and real Chinese SMS corpus separately.
在实现对网页中文本 信息提取的同时,如何再为用户抽取所需的图片资料,是信息检索中一个重要的方面。
While realizing drawing the Chinese version information of the webpage, how to collect the necessary picture materials for users again, it is an important respect in information retrieval.
在实现对网页中文本 信息提取的同时,如何再为用户抽取所需的图片资料,是信息检索中一个重要的方面。
While realizing drawing the Chinese version information of the webpage, how to collect the necessary picture materials for users again, it is an important respect in information retrieval.
应用推荐