You need to have confidence in the data you collect to help your organization run efficiently. This is especially true when you are accessing the powerful data management capability of Salesforce. But there’s a caveat: The out-of-the-box version of Salesforce comes with fairly limited deduplication capabilities. For example, the Salesforce deduplication function is only able to merge three records at a time. Should you just start by reviewing your data one record at a time? No business has the resources or time for that. The truth is that duplicates introduce time-consuming tasks that prevent your teams from getting the most value out of the data they have.
Duplicate data inside a Salesforce environment is a big issue for companies across industries. Many sales professionals trust Salesforce to cleanse the duplicates from their data, but the fact of the matter is that it just can’t. This is where a third-party tool from the AppExchange can be extremely helpful for quickly identifying and eliminating duplicates in your data and, thus, restoring confidence in the data you collect.
Salesforce’s Built-In Deduplication Functionality
Salesforce deduplication is one of the core components that is available in all editions and consists of the following elements:
- Matching rules—This requires you, the user, to create a definition of duplicate records. This can be something like First Name + Last + Email Address, or any combination of fields of your choosing. Salesforce will then use this matching rule to match a record inside a single object or multiple objects. There are different matching rules for each standard object: Leads, Contacts, and Accounts.
- Duplicate rule—This is a rule that springs into action when a user is about to create a duplicate record. It can simply alert them that they are creating a duplicate or block them from editing the record altogether. However, what do you do with the duplicates you already have? This is where the third element comes in.
- Duplicate job—This is a feature that is only available to users who bought the Unlimited edition, so if you bought a cheaper edition, there is no way to run a duplicate job. This is where each individual matching rule you created will be executed individually, and once everything is complete, you will get a Duplicate Record Set.
As you can imagine, there are many issues that Salesforce users and admins will have to grapple with if they rely solely on the built-in deduplication functionality. Let’s look at the first one.
1) Issues with Duplicate Management
With rule-based deduplication, most companies find themselves in the following quandary: an employee finds a duplicate in their Salesforce, they notify a Salesforce admin about this duplicate, and the admin will create another rule to prevent such duplicates from reappearing. Consider all the possible variations of “fuzzy” duplicates. Each time a new “fuzzy” duplicate is discovered, a new rule will have to be created. This means that your Salesforce admins are on a wild goose chase to account for every possible type of a duplicate.
By relying on rule creation, you are wasting your Salesforce admin’s time. In real-life situations, admins just don’t have the capacity to create and constantly maintain rules in addition to managing all the other Salesforce responsibilities. When admins leave an organization, the knowledge of why the rule exists often leaves with them.
2) New Duplicates Constantly Enter Your Data
As we mentioned earlier in this article, Salesforce could prevent users from creating new duplicates, but there are so many other ways duplicate data can appear in your system. For example, let’s say one of your sales reps uploads a spreadsheet with new contacts, or leads come in automatically via an integration with the company’s marketing automation system. Neither Salesforce nor existing rule-based deduplication apps on AppExchange can entirely prevent new duplicates from coming in.
DataGroomr’s engineers have addressed this issue by leveraging the power of AI. During data imports (or copy), DataGroomr has the ability to compare and detect duplicates using machine learning models. Duplicates are clearly identified and prevented from entering, while clean data is allowed to pass through.
DataGroomr can even put your duplicate data to work. When this data contains more up-to-data or additional information, it can be automatically extracted and used to update Salesforce without creating duplication issues.
3) Mass Merge Limitations
If your Salesforce contains hundreds of thousands or even millions of records, you are likely to have thousands of duplicates. The built-in Salesforce deduplication functionality is limited to manually reviewing and merging no more than three records at a time. Imagine how much time it will take to clean your data at three records at a time!
Some organizations have created custom Apex scripts to automate these merges, but this can cause additional unintended consequences. For example, a common user complaint is that a lead, contact, or an account has suddenly disappeared. The likely culprit of course is the Apex script which likely erroneously preformed a merge. Once that is done, the recovery process is time-consuming and may not always work.
4) Limited Matching Algorithm
When creating a matching rule in Salesforce, you will be asked to provide a weight (importance) for each field. For example, if your matching rule consists of First Name + Last Name + Phone Number + Street Address, you will need to assign a weight to each field on a scale from 1-100. Although there may be someone in the organization that can say that a Phone Number is more important than a Street Address, it is simply a guessing game to estimate how much more important one field is versus another.
5) No Support for Custom Objects
Salesforce deduplication tooling limits support to Leads, Contacts, and Accounts. However, most companies use other related objects, such as Opportunities; or create their own custom objects to store information that is unique to their organization. These non-standard objects cannot be deduplicated within Salesforce.
6) Not Suitable for Handling Large Data Volumes
According to the Trailblazers Community, duplicate jobs will fail in an organization with “many records.” Therefore, if your company has hundreds of thousands or potentially millions of records, you cannot not rely on Salesforce’s deduping jobs. When we combine this issue with those mentioned earlier, such as the mass merge limitations, you can see just how limited Salesforce is when working with large data volumes.
Keeping Your Data Clean is Not Just About Duplicates
Even though duplicates are the most harmful form of bad data, issues with data hygiene extend far beyond them. Inconsistent, outdated, and incomplete data can also wreak havoc on business processes and decision-making. For instance, having outdated customer contact information can result in failed communication efforts, while incomplete data can lead to misinformed strategies and lost opportunities.
Inconsistent data, such as varying formats for addresses, can lead to skewed reports and inefficiencies when integrating data from multiple sources. This inconsistency not only complicates data analysis but also undermines the reliability of insights derived from the data. For example, if one system uses two-letter abbreviations for U.S. states and another uses full state names, merging these datasets without first applying proper standardization can lead to confusion, analytics challenges and messy reports.
Outdated data poses another significant challenge. As businesses grow and evolve, their data needs to reflect current realities. Did you know that data drift or data decay amounts to significant losses within a company? For example, if the sales team relies on an old list of suppliers or partners, it can result in missed opportunities. Making sure data is regularly reviewed and updated ensures that decision-makers have adequate tools and access to the most relevant information.
Incomplete data is equally problematic. Missing critical pieces of information, such as customer preferences or transaction histories, often leads to ineffective marketing campaigns and poor customer service. Ensuring that all necessary fields are complete and accurate is essential for a comprehensive understanding of your business landscape.
Therefore, a holistic approach to data hygiene is necessary. This includes not only removing duplicates but also ensuring consistency, updating regularly, and filling in any gaps. Prioritizing these practices is essential for any business to maintain a clean, reliable, and actionable dataset that supports effective decision-making, planning and execution.
How Should We Assess Data Quality?
Looking at data quality holistically, we should consider four categories:
Intrinsic quality measures:
- Accuracy: Does the data correctly represents the real-world it’s supposed to model?
- Validity: Do data values follow defined formats and patternswithin required ranges?
- Reliability: Can the data be trusted? Did it come from an authoritative source?
Contextual quality measures:
- Relevance: Is the data applicable and useful in the context of the current task?
- Timeliness: Is the data up-to-date or outdated, and is it available when it is needed?
- Completeness: Does the data cover all required aspects, or are there gaps?
- Precision/Granularity: Is the data at the appropriate level of detail? needed For example, for the annual trend analysis, daily sales data might be too granular but is appropriate for weekly performance checks.
Representational quality measures:
- Consistency: Are there discrepancies in data from different sources ,or do we have multiple copies with variations in different datasets?
- Uniqueness: Are there any redundant or duplicate records, or duplicate data points?
- Integrity: Are there relationships between entities, are they maintained, are they correct or are they broken?
Accessibility quality measures:
- Accessibiilty: How easy is it for users to access and use the data?
- Security: Is the data adequately protected against unauthorized access, and can it be accessed by those with the necessary permissions?
Your Business Needs a Comprehensive Solution for Duplicate Data in Salesforce
As we have seen from all of the issues and limitations Salesforce has with duplicate management, it would be best to implement a comprehensive solution to solve all of these problems. Here are some of the features that make DataGroomr the best deduplication solution on the AppExchange:
- DataGroomr leverages machine learning for deduplication instead of relying on matching rule creation.
- No rules to create means no complicated setup prior to onboarding. Just connect DataGroomr to your Salesforce and start deduping right away.
- DataGroomr scans new uploads for duplicates prior to import. That means duplicates do not ever touch your environment.
- The algorithm used to identify duplicates is customizable, so you can adjust it to your individual workflows.
If you have been let disappointed by rule-based deduplication tools, consider switching to a machine-learning approach. Remember, the rule-based deduplication apps only enhance Salesforce’s limited built-in deduplication functionality. They don’t resolve the underlying issues of rule-based deduplication. One of the great things about DataGroomr is that the machine learning algorithms take all of the hassles out of the deduping process.
Try DataGroomr for yourself with our free 14-day trial.
FAQ
Since duplicate data leads to issues like missed opportunities, poor cutsomer experience and inflated forecasts, you are not getting the most value out of your Salesforce investment. Therefore, cleansing your data of duplicates should be an important aspect of running your business.
Salesforce offers built-in duplicate detection functionality where you can create matching rules and duplicate rules to run a duplicate job and see a list of duplicate records. But, the duplicate detection and deduplication functionalities natively offered by Salesforce are not enough for a wide variety of reasons, which is why we recommend investing in a third-party application like DataGroomr.
There is a wide selection of deduplication tools offered in AppExchange, but all of them are rule-based which means that you will need to constantly create new rules for every possible duplicate variation. DataGroomr eliminates this manual work by using machine learning to identify duplicates.
The only way to reduce duplicates in your Salesforce is by staying on top of your data hygiene, which includes running scans for duplicate data. When duplicates are identified, you should merge them into one record, but if you are using the native Salesforce deduplication functionality, you will only be able to merge three records at a time.
There is a wide selection of deduplication apps available on the AppExchange. Ultimately, it comes down to whether you prefer creating rules to filter duplicates or you would rather enjoy the ease of a machine learning-based application.
The best way to prevent duplicates in Salesforce is to prevent them from coming in to begin with. This includes scanning spreadsheets, internet forms and any other data you are uploading to identify duplicates before they get to Salesforce. In addition to this, you should run automated deduplication jobs to prevent duplicates from piling up.
If you are using native Salesforce deduplication, the first thing you should do is create your matching and deduplication rules. You should also have the “Potential Duplicates” component already added to your Lightning record page to see the discovered duplicates. Now, simply navigate to the Object where you want to perform the deduplication and select “View Duplicates”. You can use the wizard to select the master record and the values you want to keep from different fields upon merge.This can be time consuming since all of the deduplication has to be done manually and only three records at a time. Alternatively, you can use DataGroomr to merge duplciates en masse, which is not possible with Salesforce alone. In addition, you do not need to waste time setting up complex merge rules since the machine learning takes care of this for you. Simply connect DataGroomr to youe Salesforce and start deduping right away.