The challenges of data cleansing with data warehouses

Nigel McKelvey, Kevin Curran, Luke Toland

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

3 Citations (Scopus)

Abstract

Data cleansing is a long standing problem which every organisation that incorporates a form of dataprocessing or data mining must undertake. It is essential in improving the quality and reliability of data.This paper presents the necessary methods needed to process data at a high quality. It also classifiescommon problems which organisations face when cleansing data from a source or multiple sourceswhile evaluating methods which aid in this process. The different challenges faced at schema-level andinstance-level are also outlined and how they can be overcome. Currently there are tools which providedata cleansing, but are limited due to the uniqueness of every data source and data warehouse. Outlinedare the limitations of these tools and how human interaction (self-programming) may be needed to ensurevital data is not lost. We also discuss the importance of maintaining and removing data which has beenstored for several years and may no longer have any value.

Original languageEnglish
Title of host publicationEffective Big Data Management and Opportunities for Implementation
PublisherIGI Global
Pages77-82
Number of pages6
ISBN (Electronic)9781522501831
ISBN (Print)1522501827, 9781522501824
DOIs
Publication statusPublished - 20 Jun 2016

Fingerprint

Dive into the research topics of 'The challenges of data cleansing with data warehouses'. Together they form a unique fingerprint.

Cite this