Skip to main content
Data CleansingData ManagementDedupe Salesforce

Best practices for ensuring Salesforce data integrity

By February 20, 2025No Comments
comprehensive guide to salesforce data quality

Salesforce empowers organizations to effectively manage sales processes, marketing campaigns, and customer interactions. It has the promise to unite your company — your sales, service, marketing, commerce, and IT teams — around a single shared view of your customers. But while Salesforce promotes a “360 degree customer view,” the company learned that only 32% of companies have access to a “single source of truth” for customer information in a 2021 study.

“Dirty” customer data  – missing, duplicate, or outdated data – is the first (big) challenge, and it can wreak havoc on your business.
This guide moves quickly beyond making you nervous about all the bad things that can happen and offers specific, tangible solutions that you can begin implementing right now. This guide covers:

  • Risks of poor quality data​
  • How to assess data quality​
  • Types of data quality issues​
  • Preventative and remedial action plans
  • Tools available​ 

Consequences of poor quality data​

You’ve undoubtedly experienced some of the problems with data you can’t trust. Other troubles may be lurking that you have not experienced – at least not yet. So before we outline solutions, let’s explore the primary implications of having poor data – to light a fire and motivate you to make data quality a priority!

Dirty data in your Salesforce instance is pervasive and impacts:

Effectiveness: Duplicate, incomplete, incorrect contacts and leads result in:

  • Longer sales cycles
  • Stalled deals
  • Poor customer experiences
  • Inaccurate forecasts
  • Ineffective marketing campaigns 
  • Confusing revenue reporting
  • Sub-par decision-making 
  • Customer churn

Productivity: Dirty data wastes time (and lacks credibility). A recent study shows that (frustrated) sales professionals waste 30% of their time sorting through bad data inside their CRM. Forrester reports that nearly one-third of analysts spend more than 40% of their time vetting and validating their analytics data before it can be used for strategic decision-making.

Costs: According to Gartner research, “the average financial impact of poor data quality on organizations is $9.7 million per year.” IBM discovered that in the U.S. alone, businesses lose $3.1 trillion annually due to poor data quality. Lost sales opportunities and increased marketing costs are only the beginning. 

Compliance: Accurate data is needed to comply with regulations, assess risk and prevent costly violations.

Reputation: Poor customer satisfaction can hurt your company’s reputation – both directly and in today’s world, much more broadly due to the ease of sharing negative experiences. When marketing campaigns don’t reach their intended recipients, it doesn’t just result in more money spent for a lower response rate; it creates poor prospect and customer experiences of your brand.

To weigh your investment options, consider the industry standard “1-10-100” rule.

  • It costs $1 to verify the data as it’s being entered.
  • It costs $10 to clean the data once records are infected with dirty data.
  • It costs $100 if you do nothing.
cost of dirty data

How should you assess data quality?​

We group data quality metrics into four categories: 

  1. Intrinsic quality
  2. Contextual quality
  3. Representational quality
  4. Accessibility quality
data quality assessment

Intrinsic quality measures the value and trustworthiness of the data, specifically: 

  • Accuracy: How close is the data to the real-world values it’s supposed to represent? 
  • Validity: Is the data value  within a required range and does it match a required pattern?
  • Reliability: Can the data be trusted? Did it come from an authoritative source? 

Contextual quality measures how useful the data is, given the task at hand. 

  • Relevance: Is data applicable and useful in the context of the current task? For example, if you want to send out text messages, you’ll need mobile phone numbers
  • Timeliness: Is data up-to-date or outdated, and is it available when we need it? 
  • Completeness: Does the data cover all required aspects, or are there gaps? You need names to personalize emails and need full financial information.
  • Precision/Granularity: Is the data at the appropriate level of detail? For example, daily sales data is needed for weekly performance meetings.

Representational quality measures how well data is presented and how easy it is to use.

  • Consistency: Data from different sources can cause data discrepancies. Multiple copies of your data may  hold variations. Or different presentation formats may require different data formats, such as abbreviated vs. full state names.
  • Uniqueness:  Does the data contain redundant or duplicate records, such as multiple records for the same individual? Does it store multiple values at the record level?
  • Integrity: If there are relationships between entities, are they maintained accurately, or are they incorrect? For example, the “state” value must actually be within that “country.”

  Accessibility quality measures how easy it is for users to access and comprehend the data.

  • Accessibility:  Is the data needed  easy to find? Is it displayed clearly? Is the most important data in a priority location or hidden?
  • Security: IIs the data adequately protected against unauthorized access? If permissions are  required, will you  have issues accessing it? 

A proactive action plan

A sound, comprehensive data quality program is built on clear roles and responsibilities, ongoing tracking of key metrics, and regular reporting.

The core team

Step one  of the plan is to establish a team who will address data quality. Depending on the size of your organization, it can be few or multiple people, but it should include:

  • Data owners, senior stakeholders in your organization who are accountable for the quality of one or more data sets. Their responsibilities include ensuring data quality reporting is in place and actions are taken on data quality issues. 
  • Data producers who are responsible for capturing the data and making sure it complies with the quality standards of data users. This will most likely be your company Salesforce administrators.
  • Data consumers who use the data every day, for example sales reps, marketing managers, and anyone else who uses data to communicate with customers or run analyses. 
  • Data stewards who help data owners identify appropriate remediation. They also ensure that employees follow documented rules and procedures.  
  • Data analysts who explore, assess, and report findings to  stakeholders. 

Track key metrics

We recommend  that you measure four key attributes of your data:

1. Completeness

First decide  which data is nice to have, which data is a must have and which data is absolutely critical. The best practice for the order in which you prioritize data types is based on the level of risk.

In practice, one of the easiest ways to get insights into data completeness of your org is to add a formula field to Salesforce objects. The formula will calculate a completeness score by accumulating points for each field that is populated (or subtracting points for blank fields). For example:

IF( ISPICKVAL(Industry,””), 0,20) + IF( ISPICKVAL(Rating,””), 0,20) + IF( LEN(BillingCity) = 0, 0,20) + IF(LEN(Phone) = 0, 0,20) + IF( ISPICKVAL(Type,””), 0,20)

You can then plot the total number of points on the dashboard or display on your record page layouts. There is a free tool on the AppExchange called “Salesforce Data Quality Analysis Dashboard” which contains formula definitions and reports for many standard objects, and it can easily be extended. 

While good for getting initial insights, you get what you pay for with the free package. More comprehensive and robust reports are available in the DataGroomr app in the Brushr module called Data Quality Models. You can watch this quick video tutorial to learn how to use it:

2. Uniqueness

Measure the duplicate rate, which is the percentage of records that have duplicates within the database. Do you have to sort through multiple records to find the right one? And if so, how do you know which one is correct? Did you already use the wrong one – for marketing or sales outreach or billing? This is one of the costliest types of bad data. 

You can generate reports using Salesforce duplicate jobs, but this feature is only available in the Salesforce Unlimited edition and is going to be limited. Learn more about Salesforce native dedupe limitations in our article,  Why Salesforce Deduplication Is Not Enough.

3. Accuracy and consistency

A  standardization score  evaluates records based on their adherence to predefined standards. This can include use of standard units, terminology, and date formats. Measure the inconsistency rate;  variations in data formatting, terminology, or classification across records; and mismatch rate, which is a percentage of records that do not match predefined criteria or standards.  

DataGroomr includes a drag and drop tool to design consistency models, or use one of the predefined scenarios.

data quality model editor

4. Timeliness 

First, “timely” needs to be defined for each type of data. Then you can measure latency and age. Latency measures the duration between when the information is generated or collected and when it’s available in Salesforce. Age of data is an average time since the data in the record was last verified or updated. 

Below is an example of the DataGroomr data quality model that assesses Lead timeliness.

data quality model editor

Report

After collecting these metrics, generate a data quality assessment report. Roll up all the key metrics into one dashboard including a completeness score, duplicate rate, standardization score, latency and age. Present these insights to data owners and discuss where you currently are and where you want to be. Identify KPIs. Document your issues, challenges, decisions and next steps.

reporting

Remediation: A practical approach  

What if your data quality needs improvement? There is much that can be done!  We recommend that you take action to prevent bad data, improve consistency and completeness, deduplicate, standardize, automate – and communicate.

Prevent bad data

Begin by putting safeguards in place to prevent bad data from disrupting your business.

  1. Mandatory fields: Mark critical and must have fields as mandatory in Salesforce. 
  2. Validation rules​: Add Salesforce validation rules: for completeness and accuracy. 
  3. Picklists: Use Picklists wherever possible for data consistency and ease of use. 
  4. Page layouts​: Use records types with associated page layouts for various scenarios. Make sure page layouts are customized for simplified data entry. Group essential fields together at the top or in prominent sections. 
  5. Record types​: Create and assign different record types to the different types of records to effectively segregate validation rules and page layouts. Learn more by reading Salesforce Custom Record Type for Improving Team Performance.
  6. Automate​: Use tools like Flows to auto-populate fields when certain conditions are met. 
  7. Permissions​: Set up permissions to make sure data can be entered only by authorized and required users.
  8. Data import protocols​: Make sure the entry ways for data coming into Salesforce are limited and secured. If you’re importing data from external sources, dedupe, validate and standardize before importing. Watch our effective import tool in action in this brief video: DataGroomr Importr Guide.
  9. Data entry training: Train users on the importance of data completeness. Offer tips and training sessions to ensure that everyone entering data into Salesforce understands the importance of maintaining data integrity.

Improve consistency

When designing metadata, follow naming conventions. Despite being a very simple rule to follow, overlooking it is a common mistake among beginners. It exposes the organization to data entry errors and will be a headache as the organization expands. Make sure to:

  • Use descriptive names and consistent casing.
  • Exclude technical and unnecessary details in field names. 
  • Use prefixes when naming fields.
  • Apply informative labels and clear descriptions.
  • Use versioning. 

Ensure completeness

How about incomplete records? What can you do to deliver complete data?

  1. Use Salesforce Optimizer to see which fields are used; drop unused fields. 
  2. Implement ETL processes to bring data into Salesforce from external systems. 
  3. Use data enrichment sources like Zoominfo, Dun and Bradstreet, and SalesIntel to enrich existing records. 
  4. Remove records that are below a pre-determined completeness threshold. 
  5. Don’t forget to back up! 

Deduplicate

Salesforce comes with a  basic set of features to fight duplicates. Go ahead and set up Salesforce duplicate rules; they are good for catching obvious duplicates and basic prevention.

  • First set up matching rules, which identify which records should be considered duplicates.
  • Duplicate rules define what action should be taken when duplicates are detected. 
  • Duplicate jobs can run periodically and generate duplicate record sets for reporting and cleanup. 

However, there are many limitations:

  • Salesforce matching rules can only detect obvious duplicates.
  • There is no mass merge.
  • Only standard objects like leads, contacts and accounts are supported.
  • You cannot deduplicate across objects. 

In most cases,  you need to use a third-party deduplication tool like DataGroomr to clean your Salesforce faster!

Prevent new duplicates

Make sure new duplicates are not coming in. Establish integration safeguards. If you import data, use third-party tools that check for duplicates before any new data enters your database. 

Standardize

Define a taxonomy and adhere to it as much as possible. It’s always a good idea to manage your taxonomy globally so that all integrations are consistent. 

Standardize fields; make sure capitalization is consistent; store telephone numbers in universal format; and spell mailing addresses, company names,  and websites in a standard way. The most common fields that should be standardized include:

  • Capitalization 
  • Phone numbers​
  • Territories and regions​
  • Email addresses 
  • Mailing addresses​
  • Websites, URLs​
  • Company names

The easiest and most reliable way to achieve standardization is to apply a third-party data quality tool like DataGroomr.

standardize

Validate 

Verify that your data is truthful by using trusted external databases. Especially important is validating contact information by running checks on phone numbers, emails and mailing addresses. There are third-party tools which can help; DataGroomr​ provides a free DataGroomr Verify app available on AppExchange that seamlessly integrates with Salesforce for one-time and mass verifications.

Remove invalid records and again, don’t forget to back up! 

Automate 

Automate your data cleansing actions as much as possible to ensure consistency in executing your program and to maximize data health. We recommend that you schedule automated maintenance jobs as follows:

  • Run ETL jobs and backups.
  • Standardize and deduplicate nightly or at least weekly. 
  • Verify the accuracy of your data.

Communicate 

Create awareness of the importance of data quality in your organization and encourage everyone to be part of the solution.

  • Inform with relevant, compelling emails and infographics.
  • Train users on how to enter and validate data.
  • Encourage users to participate in data quality improvement efforts. 
  • Make it easy to report data quality issues.
  • Reward users for finding and correcting errors.
communicate

Summary

You probably already knew why data quality is important and the risks of having poor quality data. Hopefully, this guide will help you to assess your data quality, identify different types of data issues, and create your own action plan. 

Don’t be overwhelmed or intimidated. Just take one step at a time!

Ben Novoselsky

Ben Novoselsky, DataGroomr CTO, is a hands-on software architect involved in the design and implementation of distributed systems, with over 19 years of experience. He is the author of multiple publications about the design of the distributed databases. Ben holds a Ph.D. in Computer Science from St. Petersburg State University.