对于文档方面,我们可以从结构化文本开始接近微格式。
For the document approach we can come at microformats from the structured text side.
与自由文本或结构化文档搜索比较而言,处理、搜索和显示数据是一个简单的过程。
The processing, searching, and displaying of data is a simple process compared to the free text or structured document searches.
尽管本文主要关注文本分析,但是UIMA还可以用于分析其他类型的非结构化信息,比如音频和图像。
While this article has focused on text analysis, UIMA can also be used to analyze other kinds of unstructured information such as audio and images.
UIMA是一个用于分析非结构化内容(比如文本、视频和音频)的组件架构和软件框架实现。
UIMA is a component architecture and a software framework implementation for the analysis of unstructured content such as text, video, and audio.
在这两个场景中,非结构化数据的主要类型是文本。
In both of the above scenarios, text is the main type of unstructured data.
我所谓的强大是指,那些解决方案必须能够从结构化数据(例如数据库和网页)和非结构化数据(例如文本、音频和视频)中提取可操作的信息。
By strong, I mean they must be able to extract actionable information from both structured data, such as databases and Web pages, and unstructured data, such as text, audio, and video.
下一小节将针对此类集成给出一个逐步指导示例:文本分析被用于从包含非结构化信息的数据库表中提取结构化信息。
The next section gives you a step-by-step example for this kind of integration: text analysis is used to extract structured information from a database table containing unstructured information.
不过,一些浏览器可能以更结构化的方式显示已返回的文本。
Some browsers, though, might show the returned text in a bit more structured manner.
为了进行说明,我使用IMDB内容的子集构建了一个DB2结构化数据库,将这些传记信息作为文本字段保存在数据库中。
For illustration, I built a DB2 structured database from a subset of the IMDB content, and included the trivia as text fields in this database.
还可以包含其他证据,这可以通过包含其他结构化数据(比如用数据库表记录哪些人为同一部电影工作过),或者通过进行更深入的文本分析。
We could include other evidence for a connection by either including additional structured data, such as database tables that show which people worked together on movies, or by deeper text analysis.
在只需要从响应文档中提取单一值的场景中,“欺骗性”地把XML当作文本字符串,而不把它当作结构化的文档对待,会更方便。
In a scenario where you only need to extract a single value from a response document, it can be more convenient to "cheat," treating the XML as a string of text rather than a structured document.
我们曾经有一个项目,它需要非结构化的文本分析;uima似乎是理想之选。
In one of our projects, there was need for unstructured text analysis; UIMA looked like doing exactly what was needed.
在图3所示的场景中,先给形式自由的文本中的概念加注解,然后把它们与现有的结构化信息一起写到一个数据库表中。
Figure 3 depicts a scenario where concepts in free-form text are first annotated and later written to a database table together with existing structured information.
DB 2文本搜索能够对存储在DB 2数据库中的结构化和非结构化数据进行全文搜索。
DB2 text search enables full-text search on structured and unstructured data stored in a DB2 database.
在这种情况下,结构化元素会作为可搜索文本的一部分,而且只支持SQL查询语法。
In this case, the structural elements are treated as a part of the searchable body of text, and only SQL query syntax is applicable.
结构化文本格式紧靠在内容一端,而YAML和JSON 则在数据一端。
Structured text formats come down firmly on the content side, while YAML and JSON come down firmly on the data side.
它还演示了如何组合结构化数据库和文本挖掘。
And it also illustrates how it can be combined with structured databases and data mining.
您了解了如何设置UIMA开发环境,如何创建自己的注释器,以及在InfoSphere Warehouse中使用定制注释器从文本输入提取结构化信息。
You have learned how to setup the UIMA development environment and how to create your own annotator and use it in InfoSphere Warehouse to extract structured information from text input.
然后,Text操作器可以从文本列中提取结构化信息,把它们作为新列(其中包含找到的姓名、技能、日期等概念)添加到输出中。
Text operators can then extract structured information from text columns and add them to the output as new columns containing found concepts like names, skills, dates etc..
Yacc是一种语法分析器,它可以读取文本并用来将单词序列转换为便于处理的结构化的格式。
Yacc is a grammar parser; it reads text and can be used to turn a sequence of words into a structured format for processing.
开发者可以向Placemaker输入任何形式的结构化和非结构化数据,包括频道、网页,程序会分析文本并从中提取位置数据。
Developers can feed Placemaker any kind of structured and unstructured data, including feeds and web pages, and the app will analyze the text and extract location data from it.
基本文本搜索(Basic Text Search)DataBlade模块允许在存储在表列中的非结构化文档库中搜索词和短语。
The Basic Text Search DataBlade module allows you to search words and phrases in an unstructured document repository stored in a column of a table.
此信息使用具有Jazz感知性的富文本编辑器以非结构化的方式在Overview页面上进行捕获。
This information is captured on the Overview page in an unstructured way using an artifact-aware text editor.
Cognos8Reporting同样能够使用来自各种数据源的结构化信息,并且可用于将文本分析结果传播给广泛的受众。
Cognos 8 Reporting is able to consume structured information from many data sources, and it can be used to propagate the text analysis results to a wide audience.
互联网的搜索引擎们把主要精力都放在采集web页面的文本信息上,但是google却在研究如何分析和组织结构化数据方面小有所成,该公司的一位科学家上周五表示。
Internet search engines have focused largely on crawling text on Web pages, but Google is knee-deep in research about how to analyze and organize structured data, a company scientist said Friday.
查询工作项(通过获得一个HTML模块,或者通过创建全文本和结构化的查询来实现)。
Query work items (by either obtaining an HTML picker module or by creating full-text and structured queries).
关键词特性描述非结构化文本内容中出现的单词和短语。
The keyword features describe the occurrences of words and phrases in the unstructured text content.
考虑收集到的大量快餐餐馆的问卷调查,其中包含了许多非结构化文本。
Consider a large collection of fast food restaurant surveys, which amounts to a large amount of unstructured text.
在这种情况下,结构化元素可用于确定搜索的文本位置,但是结构化元素本身不是可搜索文本的一部分。
In this case, structural elements can be used to identify the portion of text that should be searched, but structural elements themselves are not part of the searchable body of text.
公司还可能需要分析半结构化文本(比如XML内容)或其他数据类型(比如音频和视频)。
Companies may also have a need to analyze semi-structured text (such as XML content) or other data types (such as audio and video).
应用推荐