Abstract
Data cleansing is a long standing problem which every organisation that incorporates a form of dataprocessing or data mining must undertake. It is essential in improving the quality and reliability of data.This paper presents the necessary methods needed to process data at a high quality. It also classifiescommon problems which organisations face when cleansing data from a source or multiple sourceswhile evaluating methods which aid in this process. The different challenges faced at schema-level andinstance-level are also outlined and how they can be overcome. Currently there are tools which providedata cleansing, but are limited due to the uniqueness of every data source and data warehouse. Outlinedare the limitations of these tools and how human interaction (self-programming) may be needed to ensurevital data is not lost. We also discuss the importance of maintaining and removing data which has beenstored for several years and may no longer have any value.
Original language | English |
---|---|
Title of host publication | Effective Big Data Management and Opportunities for Implementation |
Publisher | IGI Global |
Pages | 77-82 |
Number of pages | 6 |
ISBN (Electronic) | 9781522501831 |
ISBN (Print) | 1522501827, 9781522501824 |
DOIs | |
Publication status | Published - 20 Jun 2016 |