对于文档方面,我们可以从结构化文本开始接近微格式。
For the document approach we can come at microformats from the structured text side.
关键词特性描述非结构化文本内容中出现的单词和短语。
The keyword features describe the occurrences of words and phrases in the unstructured text content.
该文介绍了一个应用于结构化文本的检索系统的设计和实现。
The design and implementation of Structured Text Retrieval System is introduced in this paper.
此方法有效地解决了从非结构化文本中抽取结构化信息的难题。
The method we raise here solves the problem of how to extract structured information from unstructured texts.
结构化文本格式紧靠在内容一端,而YAML和JSON 则在数据一端。
Structured text formats come down firmly on the content side, while YAML and JSON come down firmly on the data side.
考虑收集到的大量快餐餐馆的问卷调查,其中包含了许多非结构化文本。
Consider a large collection of fast food restaurant surveys, which amounts to a large amount of unstructured text.
提出并实现了一种改进的面向半结构化文本信息的压缩算法:LZWX算法。
Proposes and implements an improved semi-structured-oriented data compression algorithm: LZWX.
公司还可能需要分析半结构化文本(比如XML内容)或其他数据类型(比如音频和视频)。
Companies may also have a need to analyze semi-structured text (such as XML content) or other data types (such as audio and video).
如果您需要构建一个可靠搜索应用程序,用于合并基于点的位置的结构和非结构化文本,那么关注Lucene和Solr就足够了。
But if you need to build out a solid search application that combines the structure of point-based locations with unstructured text, then look no further than Lucene and Solr.
针对球磨机系统的多变量及非线性等特点,本文对球磨机系统进行对象特性分析,进行了模糊控制器的设计,并在PL C上采用结构化文本语言编程实现。
For the multivariable and nonlinear characteristics of ball mill system, analyzed the characteristics of objects, and designed fuzzy controller, which is realized on PLC using st programming language.
另外,还有一个称为“语义同义词(semantic synonym)”的新概念,它通过转换用户的关键字查询,从根本上简化对从非结构化文本中提取的概念的搜索。
Furthermore, a novel concept called "semantic synonyms" radically simplifies searching for concepts extracted from unstructured text by transforming the user's keyword query.
与自由文本或结构化文档搜索比较而言,处理、搜索和显示数据是一个简单的过程。
The processing, searching, and displaying of data is a simple process compared to the free text or structured document searches.
尽管本文主要关注文本分析,但是UIMA还可以用于分析其他类型的非结构化信息,比如音频和图像。
While this article has focused on text analysis, UIMA can also be used to analyze other kinds of unstructured information such as audio and images.
UIMA是一个用于分析非结构化内容(比如文本、视频和音频)的组件架构和软件框架实现。
UIMA is a component architecture and a software framework implementation for the analysis of unstructured content such as text, video, and audio.
在这两个场景中,非结构化数据的主要类型是文本。
In both of the above scenarios, text is the main type of unstructured data.
我所谓的强大是指,那些解决方案必须能够从结构化数据(例如数据库和网页)和非结构化数据(例如文本、音频和视频)中提取可操作的信息。
By strong, I mean they must be able to extract actionable information from both structured data, such as databases and Web pages, and unstructured data, such as text, audio, and video.
下一小节将针对此类集成给出一个逐步指导示例:文本分析被用于从包含非结构化信息的数据库表中提取结构化信息。
The next section gives you a step-by-step example for this kind of integration: text analysis is used to extract structured information from a database table containing unstructured information.
不过,一些浏览器可能以更结构化的方式显示已返回的文本。
Some browsers, though, might show the returned text in a bit more structured manner.
为了进行说明,我使用IMDB内容的子集构建了一个DB2结构化数据库,将这些传记信息作为文本字段保存在数据库中。
For illustration, I built a DB2 structured database from a subset of the IMDB content, and included the trivia as text fields in this database.
还可以包含其他证据,这可以通过包含其他结构化数据(比如用数据库表记录哪些人为同一部电影工作过),或者通过进行更深入的文本分析。
We could include other evidence for a connection by either including additional structured data, such as database tables that show which people worked together on movies, or by deeper text analysis.
在只需要从响应文档中提取单一值的场景中,“欺骗性”地把XML当作文本字符串,而不把它当作结构化的文档对待,会更方便。
In a scenario where you only need to extract a single value from a response document, it can be more convenient to "cheat," treating the XML as a string of text rather than a structured document.
我们曾经有一个项目,它需要非结构化的文本分析;uima似乎是理想之选。
In one of our projects, there was need for unstructured text analysis; UIMA looked like doing exactly what was needed.
在图3所示的场景中,先给形式自由的文本中的概念加注解,然后把它们与现有的结构化信息一起写到一个数据库表中。
Figure 3 depicts a scenario where concepts in free-form text are first annotated and later written to a database table together with existing structured information.
DB 2文本搜索能够对存储在DB 2数据库中的结构化和非结构化数据进行全文搜索。
DB2 text search enables full-text search on structured and unstructured data stored in a DB2 database.
在这种情况下,结构化元素会作为可搜索文本的一部分,而且只支持SQL查询语法。
In this case, the structural elements are treated as a part of the searchable body of text, and only SQL query syntax is applicable.
它还演示了如何组合结构化数据库和文本挖掘。
And it also illustrates how it can be combined with structured databases and data mining.
您了解了如何设置UIMA开发环境,如何创建自己的注释器,以及在InfoSphere Warehouse中使用定制注释器从文本输入提取结构化信息。
You have learned how to setup the UIMA development environment and how to create your own annotator and use it in InfoSphere Warehouse to extract structured information from text input.
然后,Text操作器可以从文本列中提取结构化信息,把它们作为新列(其中包含找到的姓名、技能、日期等概念)添加到输出中。
Text operators can then extract structured information from text columns and add them to the output as new columns containing found concepts like names, skills, dates etc..
Yacc是一种语法分析器,它可以读取文本并用来将单词序列转换为便于处理的结构化的格式。
Yacc is a grammar parser; it reads text and can be used to turn a sequence of words into a structured format for processing.
开发者可以向Placemaker输入任何形式的结构化和非结构化数据,包括频道、网页,程序会分析文本并从中提取位置数据。
Developers can feed Placemaker any kind of structured and unstructured data, including feeds and web pages, and the app will analyze the text and extract location data from it.
应用推荐