并对今后数据清洗的研究和应用进行展望。
Finally, the future research topics and application related to data cleaning problems are discussed.
介绍数据清洗问题产生的背景和国内外研究现状。
Firstly, the background of data cleaning problem and research status is explained.
在数据装入数据仓库之前,应该对数据进行数据清洗。
Before data finally is loaded into data warehouse, it should be cleaned.
摘要:可扩展性和可交互性是数据清洗系统的主要特征。
Absrtact: the prominent features of the data cleaning system are manifested in extendibility and interactivity.
数据清洗是数据仓库和数据挖掘中非常重要的一个环节。
Data cleansing is an important step both in data warehousing and data mining.
本文的重点是对可扩展可定制数据清洗框架的研究与设计。
This paper put much emphasis on the research and design of the data cleaning framework which can be extensible and customized.
缺少元数据管理,用户很难分析和逐步调整数据清洗过程。
The last is lack of management of metadata, so the users cann't analyse or adjust the data cleaning processes.
数据清洗-如何决定哪些名字拼写错误或相当,但略有不同的?
Data cleansing - how to decide which names are misspellings or are equivalent but slightly different?
根据研究需要进行数据清洗,获得804例患者的临床信息,锁定数据库。
Data were screened according to the requirements of the study, finally the clinical information of 804 cases was selected, and then the database was locked.
如果您猜测数据集包含不正确的数据,那么可以应用偏差检测进行数据清洗,从而发现数据库中不正确的条目。
If you assume that your dataset contains incorrect data, then you can apply deviation detection for data cleansing, thereby finding incorrect entries in your database.
是个开源的数据挖掘平台,通过一个用户友好的工作流接口提供通用数据挖掘模型的构建和数据清洗功能。
AlphaMiner is an open source data mining platform that offers versatile data mining model building and data cleansing features with an user friendly workflow interface.
首先简要介绍数据清洗与选择的基本方法,然后详细论述数据预处理、数据表示和数据集管理等方面的问题。
It first describes the data cleaning and data selection briefly, then discusses the data preprocessing and data representation as detail as possible, at last, introduces the data set management.
数据预处理平台通过数据抽取、数据清洗、数据转换和数据加载等方法整合并转换现有数据资源至数据仓库。
Through data extraction, data cleaning, data conversion and data loading the data pre-processing platform integrate the original data into the data warehouse.
以往数据清洗工具在三个方面存在不足:工具和用户之间缺少交互,用户无法控制过程,也无法处理过程中的异常;
One is lack of human interaction, so users cant control the data cleaning processes and cant solve the exceptions in the processes;
数据仓库中相似重复记录的识别与消除是数据清洗的热点问题,其中地址类信息对相同实体识别起着非常重要的作用。
Its a hot issue to eliminate approximately duplicated records in data cleansing operation of data warehouse, in which the address information play an important role to identify the same entity.
本文主要介绍了人口信息系统数据挖掘的数据预处理过程,包括人口系统的属性规约,数据清洗和数据转换等具体过程。
This paper introduces the process of data preparing processing of population information system data mining, including the regulation of population system, data clearing, data converting and ect.
对ETL数据清洗转换进行了介绍,主要阐述了数据质量问题在数据仓库解决方案中的重要地位、数据质量的分类,同时对本课题中的ETL的设计和实现进行了描述。
And introduce the ETL, illustrate the importance of data quality in this subject, and the classification of data quality, and also illustrate the design and implement of ETL.
本文首先分析总结了数据清洗的有关概念,给出了数据清洗中需要解决的质量问题,并总结了解决这些问题的技术和方法。在此基础上提出了以人为中心的数据清洗过程模型。
This paper reviews some concepts on data cleansing, lists the data quality issues needed to be resolved in data cleansing process, and presents the techniques and methods for data cleansing firstly.
然后,重新创建的空表又可以摄取新的数据,直到清洗日的到来。
The empty table is then ready to ingest new data until its next purge date.
我们使用数据质量工具来配置和清洗错误数据。
DataStage中的quality stage可以清洗数据,移除错误的记录。
The quality stages within the DataStage can cleanse the data and remove erroneous records.
数据获取:用于获得、清洗、转换和集成数据的ETL(提取、转换和装入)过程。
Data acquisition: ETL (Extraction, Transformation, and Loading) processes to acquire, clean, transfer, and integrate data.
清洗:在某些最近历史表中,每过一些天就删除整个表,并重新创建表,数据将放在有更长历史的另一个表中。
Purge: in some recent history tables, every few days the entire table is dropped and recreated because data is now available in another table with a longer history.
因此,根据经验法则,您可以考虑将页面清洗器的数目设置为数据库服务器上CPU的数目。
Hence, as a rule of thumb, you may consider setting the number of page cleaners equal to the number of CPUs on the database server.
还请记住,配置太多页面清洗器可能会损害数据库服务器上的运行队列,并导致极大的性能下降。
Also keep in mind that having too many page cleaners may overwhelm the run queue on the database server and cause the significant performance degradation.
通过将CHNGPGS_THRESH设置为一个较低的数,如20%,将更频率地触发页面清洗器,但只有较少的数据写入磁盘,而用户也觉察不到这一延迟。
By setting CHNGPGS_THRESH to a lower amount like 20%, the page cleaners will be triggered more often, but less data will be written to disk, and the slowdown may be unnoticeable by your users.
另一方面,数据库代理需要缓冲池中的空间之前,页面清洗器将已修改的页面从缓冲池写入磁盘。
Page cleaners, on the other hand, write changed pages from the buffer pool to disk before the space in the buffer pool is required by a database agent.
在数据转换过程中,清洗并清理了数据仓库中的数据。
The data in a data warehouse is cleansed and scrubbed during the data transformation processes.
你总是需要花费大量的时间准备和清洗数据。
You will spend most of your time cleaning and preparing data.
你总是需要花费大量的时间准备和清洗数据。
You will spend most of your time cleaning and preparing data.
应用推荐