What is Data Cleansing?

data cleansing

Just about every organization has data today.  But that data can be worth nothing without some form of data quality management.  As part of this process, data cleansing is vital.  But what is data cleansing?  Generally it breaks down into three primary actions:

Deduplication

One of the common issues with data is duplicate records.  This can be due to spelling variations, or simple error in data input.  However, each duplicate can have a significant impact on how meaningful your data is.  Removing these duplicates means reducing costs and more effective usage of the data.  For example, if you have a record for “Steve Wilson” and another for “Stephen Wilson”, this can lead to wasted resources in a direct mail campaign.  At the same time, care has to be taken to prevent the removal of valid records.

Normalization

Another common issue is normalization of the data.  This can be applied in many ways.  What does normalization actually mean however?  It’s the process where data is corrected to be consistent across records.  An example is choosing between “st” and “street”.  Either can work, but ideally your data chooses one version and uses that consistently.  This makes running all types of queries or searches easier, and returns results that actually meet what you’re looking for.

Record Completion

The final main component of data cleansing is completion of records.  Oftentimes data is missing from a given record; sometimes vital data.  For example, a given record may have name, address, and phone number that record.  For an email campaign, that record may as well not exist.  During the process of data cleansing, external sources can sometimes be used to fill in these gaps in the data.  Alternatively a manual process can be undergone periodically to try and complete the data as best possible.  Finally, there may be other records which have relevant data for a particular other record.

 

These three concepts together are the core of data cleansing.  To maintain good data vs bad data, all three should be undertaken periodically.  Unfortunately, doing it once and forgetting about it is not really an option.  As new data is input, the risk of bad data increases yet again, and you could end up in the same, or worse, spot that you started.  Maintaining the value of your data requires continuous vigilance and undertaking an approach that is repeatable.

So now that you know what needs to be done for data cleansing, how do you do it?  Check out www.datagroomr.com for an innovative approach to accomplishing your goals and improving data quality.  We simplify the daunting task of cleaning your data by leveraging Machine Learning (ML) so that you don’t have to be a data expert.  Questions?  Email us at info@datagroomr.com!

No Comments

    Leave A Comment