Data Cleansing

The Value of Data Accuracy

By Data Cleansing

There is an axiom that has existed in data science for years.  “Garbage in, garbage out.”  Despite the existence of such sound advice, too often it is forgotten.  Organizations routinely use information that, when tested, does not meet the minimum acceptable range of data accuracy, per the Harvard Business Review.  However, creating information that is acceptable has a cost in dollars.  Is it worth the investment?  What are the benefits of having good data?

1. Importance of Data Quality

In the modern business landscape, data feeds directly into revenue.  When you are able to rely on your informaton because it is clean of duplicates, utilizes standard formatting, and has correct values in the critical fields, you can make confident decisions that lead to increased revenue.

2. Cost of Bad Data

Data feeds into many business processes.  Bad data means those processes run less efficiently.  Direct mail campaigns are sent to incorrect addresses.  Email campaigns send messages to non-existant addresses.  Even the best designed campaign will fail to meet expectations without the clean information to feed into them.  Good data means saving money on wasted efforts.  Well managed data means campaigns are efficient and targeted, helping to maximize return on a lower investment.

3. Customer Satisfaction

Customer satisfaction is imperative to any business.  By assuring that records are appropriately clean and accurate, your organization can be sure to match different elements together, you can be sure to deliver what your customers expect at each and every touchpoint.  Your customers will be happier, feeling that your organization is meeting their needs.

4. Impact of Data Accuracy (in time)

One of the sometimes hidden issues of poor data quality is the time spent manually fixing bad data.  Simple steps like standardizing “st” versus “street” done manually can be time consuming and error prone.  Often times the actions that are taken to fix information are done on output records, not the actual source, which means that the same steps will have to be repeated every time the source is used.  Departments will codify manual data correction and normalization as part of standard procedure, instead of expecting clean, correct information!  Applying principles of good data management saves time.

5. Good Data vs Bad Data

Organizations spend money on systems to maximize the data accuracy and the value they can get from their information.  But the underlying records doesn’t always allow these systems to operate as best possible.  But if you can be confident that your records are clean, you can be confident in the results of processing that data through other systems.  The investment in systems that work from your organization’s data will have a greater ROI.


Data is an invaluable resource, but only if you can rely on it.  The examples above are just some of the ways that clean data provides value to organizations.  In order to achieve this goal, it sometimes takes using a third party resource to perform data hygiene tasks.  DataGroomr is a data cleansing solution that is simple to use, while leveraging advanced machine learning technology to apply best practices in data cleansing.  Questions?  Reach out to us at or signup for a free demo at

What is Data Cleansing?

By Data Cleansing

Just about every organization has data today.  But that data can be worth nothing without some form of data quality management.  As part of this process, data cleansing is vital.  But what is data cleansing?  Generally it breaks down into three primary actions:


One of the common issues with data is duplicate records.  This can be due to spelling variations, or simple error in data input.  However, each duplicate can have a significant impact on how meaningful your data is.  Removing these duplicates means reducing costs and more effective usage of the data.  For example, if you have a record for “Steve Wilson” and another for “Stephen Wilson”, this can lead to wasted resources in a direct mail campaign.  At the same time, care has to be taken to prevent the removal of valid records.


Another common issue is normalization of the data.  This can be applied in many ways.  What does normalization actually mean however?  It’s the process where data is corrected to be consistent across records.  An example is choosing between “st” and “street”.  Either can work, but ideally your data chooses one version and uses that consistently.  This makes running all types of queries or searches easier, and returns results that actually meet what you’re looking for.

Record Completion

The final main component of data cleansing is completion of records.  Oftentimes data is missing from a given record; sometimes vital data.  For example, a given record may have name, address, and phone number that record.  For an email campaign, that record may as well not exist.  During the process of data cleansing, external sources can sometimes be used to fill in these gaps in the data.  Alternatively a manual process can be undergone periodically to try and complete the data as best possible.  Finally, there may be other records which have relevant data for a particular other record.


These three concepts together are the core of data cleansing.  To maintain good data vs bad data, all three should be undertaken periodically.  Unfortunately, doing it once and forgetting about it is not really an option.  As new data is input, the risk of bad data increases yet again, and you could end up in the same, or worse, spot that you started.  Maintaining the value of your data requires continuous vigilance and undertaking an approach that is repeatable.

So now that you know what needs to be done for data cleansing, how do you do it?  Check out for an innovative approach to accomplishing your goals and improving data quality.  We simplify the daunting task of cleaning your data by leveraging Machine Learning (ML) so that you don’t have to be a data expert.  Questions?  Email us at!