Skip to main content
Data Cleansing

Database Cleaning Checklist: End-of-Year Routine

By December 23, 2020No Comments

With the holiday season already upon us, sales cycles tend to slow down before picking up again in January. You can take advantage of the holiday lull to clean up your data since this can be a time-consuming process. In our previous blog posts on how to conduct a data quality assesment, we described some of the processes that go into conducting a data quality analysis. But if a big campaign is not something that you are looking to do right now, we have created a checklist you can use to clean up your data. Let’s start by understanding why data cleansing is so important and the effects it has on your business.

The Impact of a Dirty Database on Your Business

Dirty data is a resource drain in terms of both time and money. According to some estimates, incomplete or outdated information is costing US companies as much as 27% of their annual revenue. In many cases, data is spread around multiple repositories and databases, each containing bits and pieces of information. If the data is needed for sales or marketing then the challenge will be in assembling all the desperate information into a concise picture of the customer. This can be an extremely time-consuming and resource intensive exercise. Aside from simply collecting the bits and pieces of information, all of the data needs to be in the right format and current. Otherwise, potential customers will look less favorably on your organization.

Fix Formatting Issues and Standardize Formats

Many systems that leverage database technology rely on properly and consistently formatted data in order to execute their intended operations. Even if we look at the effect of something as simple as capitalization in first and last name fields, the consequences may be extensive. For example, many organizations depend on marketing automation platforms for customer outreach and emailing. An improperly formatted name will result in correspondence that does not look professional. And if this is your first interaction with a customer, it probably doesn’t instill confidence in your organization. 

It is a tall task to attempt fixing formatting issues manually.  The good news is that there are plenty of apps on the Salesforce AppExchange specifically engineered to address formatting and standardization of data. You can also develop formulas and workflow rules inside Salesforce to prevent formatting issues, but as with other rules it is challenging to create one for every scenario.  Even with rules in place, they will only ensure that any new data is in the correct format. Rules will not be useful in standardizing existing data. A more efficient approach is to use an app that automates the standardization process. Now that we have covered data collection and standardization, we need to know just how accurate the information is that we have.

Validate Your Existing Information

When reporting on future prospects, salespeople are fond of pointing to the state of their “sales funnel.”  However, a recent study that analyzed leads in the pipeline demonstrated how misleading this metric can be. The study found that 30% of processed leads have wrong phone numbers, 28% have wrong emails, and 27% have wrong names. Not only are these inaccuracies pointing to a false picture of future outcomes, the bad information will add cost to your organization in terms of time and resources. Without valid data, it is almost impossible to properly understand the effectiveness of your marketing and lead generation activities.

Data validation is particularly important to your sales team. They are reliant on accurate data to build and maintain lists of leads and corresponding offerings. Trying to make contact using outdated or erroneous information or pitching services and goods that are not a fit with the prospect are a simple waste of time and an easy path to demotivating your salesforce. 

Finally, it is not just about having valid and accurate information. It is also very much about how long it takes to get that information. According to Spotio, a sales automation platform, 50% of buyers choose the vendor that responds first. As the old saying goes, “time is money” and one of the biggest time and money drains are duplicate records, which is why you need to deal with this issue head on. 

Get Your Duplicates Under Control

As we touched on earlier, many data validation issues stem from duplicate information scattered amongst multiple locations. Duplicate data specifically can be traced to many problems by degrading the effectiveness of your sales efforts and undermines your marketing activities. 

Here are some examples of these issues: 

  • Sending multiple marketing messages to the same person 
  • Relying on inaccurate or outdated information 
  • Increasing bounce rate on correspondence

There is a tendency to initially dismiss this as a general nuisance, but being complacent with duplication will eventually degrade your conversion rate and burn through your ad spend. The rule of thumb for quantifying the cost of duplicate records is the 1-10-100 model. It costs $1 to check for duplicates, $10 to fix each duplicate and $100 per duplicate if nothing is done about this issue. 

Data deduplication should be a major part of your database cleansing strategy, but there is much more that can be done.

Get Rid of Unusable Data in Your Database

Many systems are filled with useless information because some required fields were not mandated at time of collection or just populated by dummy data. In an ideal situation, you could correct or remove this data, but this is not always feasible. If there is no way to clean this data it should just be deleted. However, the leads that can be salvaged should be recycled and placed on a lead nurturing program which we will talk about next.

Recycle Leads

As we all know, a lead is usually not ready to move ahead with a purchase the first time you contact them. This is why many organizations put a lead nurturing programming in place. However, these campaigns are not perfect. Sometimes you are just making contact at the wrong time or for whatever reason a contact is not made at all. The bottom line is that some of your leads may fall out of the pipeline. Take this time to check prospects that were engaged but did not convert into leads. Alternatively, sales can move the lead to a deferred status if no additional lead nurturing is warranted.

Where Do You Start

Following this cleaning checklist is not enough for a comprehensive data quality assessment. It would require assembling a team, purchasing licenses to products, and spending weeks or months addressing issues such as data integrity, accuracy, completeness, timeliness, etc. If you are not ready for that type of investment then there are a few automation tools that we would recommend as a quick fix for Salesforce. The first one is called InsideView, which is a decent replacement for the discontinued It will help you with processes like data preparation, standardization, and enrichment. It does not deal with data deduplication. So we recommend a tool like DataGroomr to take care of this for you. It does a quick data assessment that will tell you just how many duplicates you have.

Trust DataGroomr with All of Your Deduping Needs

DataGroomr does not simply let you know how many duplicates you have, it merges them with all the important bits of information into a single record. Instead of using filters and rules, DataGroomr uses machine learning to identify duplicates. The big advantage there is that machine learning does all of the work for you. There is no setup or maintenance. Just connect DataGroomr to your Salesforce organization and start deduping. The best part is that as you merge duplicates, the system automatically learns from your actions and improves its duplicate detection algorithm.

When you begin the DataGroomr free, 14-day trial, the first thing our algorithms will do is assess the current state of your data. If there are duplicates, you can start cleaning them right away and that means that you are well on your way to starting off 2021 with clean data.

Steven Pogrebivsky

Steve Pogrebivsky has founded multiple successful startups and is an expert in data and content management systems with over 25 years of experience. Previously, he co-founded and was the CEO of MetaVis Technologies, which built tools for Microsoft Office 365, Salesforce and other cloud-based information systems. MetaVis was acquired by Metalogix in 2015. Before MetaVis, Steve founded several other technology companies, including Stelex Corporation which provided compliance and technical solutions to FDA-regulated organizations.