Skip to main content
Data CleansingDedupe SalesforceMachine Learning

Get the Most Value Out of Your Salesforce Deduplication Tool

By February 3, 2021June 19th, 2024No Comments

Salesforce is jam-packed with all kinds of features for making insightful decisions, such as reports, dashboards, and sales forecasting. However, all of this will only be useful if you start with clean data. In a previous blog, Salesforce Deduplication Isn’t Enough, we reviewed some of the limitations (and the impact on your data) when deduplicating Salesforce using out-of-the-box options. The deduplication tool that you choose will need to both cleanse your data and overcome certain Salesforce restrictions. Let’s take a look at these in more detail and then discuss why machine learning gives you a much better return for your investment. 

Merging Records in Bulk

A well-known Salesforce restriction is that you can only merge three records at a time. While this is an issue, it is also fairly difficult to just compare duplicate records with the master record. For example, you might want to replace some of the field values in the master record with other values from the duplicate records. This whole process is very awkward and error prone in Salesforce. 

DataGroomr makes this process a lot easier by giving you a side-by-side view of the master and the duplicate records. Since the differences are specifically highlighted in red, you can easily compare field values between the master record and any number of suspected duplicate records. In addition, you can bypass Saleforce’s bulk merge restrictions by merging records en masse. 

Another challenge with deduplication in Salesforce is merging duplicates in accounts and contacts that are connected to each other via direct or indirect relationships.  These types of  redundant relationships need to be addressed in order for the merge to be successful.  Let’s look at this in detail in the next section. 

Merging Contacts and Accounts with Redundant Relationships

Salesforce has mandatory restrictions that prevent you from merging contacts and accounts with redundant relationships. For example, you may identify duplicate contacts with multiple account structures or shared indirect accounts. If so, you would not be able to proceed with such a merge natively in Salesforce. Instead you will see an error message requiring you to manually remove the redundant account-contact relationships prior to merging. DataGroomr will automatically remove this redundant relationship, enabling the merge to go through in one step. 

Merging duplicate accounts natively in Salesforce is also an issue. For example if you were to try to indirectly merge multiple accounts related to the same contact, you would again receive an error message from Salesforce prompting you to remove the redundant relationship prior to merging. DataGroomr will remove these redundancies and all the contacts from the duplicate record(s) will be shifted over to the master record while maintaining all the original relationships.

It is possible to do some complex merges natively in Salesforce, but the process is tedious, multi-step and therefore prone to error. DataGroomr does all of these behind the scenes and automatically, saving you time and preventing further issues. 

Cross-Object and Custom Objects Deduplication

Leads, contacts, accounts

A native Salesforce environment includes three pre-built objects: Leads, Contacts, and Accounts. For some customers that is sufficient, but others choose to add their own custom objects, such as Opportunities, Campaigns, Cases, etc. Salesforce has several limitations around custom objects. 

The first is that Salesforce simply does not provide functionality to dedupe custom objects. Another is that Salesforce does not have any capability to identify or dedupe data across objects. For example, if a new record is created in Leads which already exists in Contacts, Salesforce would not be able to determine that this is duplicate. To accomplish cross- or custom-object deduplication, you will need to use a third-party tool, like DataGroomr. 

All of these complex scenarios point to the need for a comprehensive approach to deduplication and overall data hygiene in your Salesforce environment. There are many popular apps that approach deduplication through a variety of ways, mostly involving the creation of rules and filters by the user. DataGroomr decided to take a different route via machine learning. 

Why Use Machine Learning for Deduplication?

By far, the most common approach to deduping Salesforce is matching rule creation. This is where Salesforce admins create matching criteria for values located in specific fields. When two or more records meet this criteria they are considered duplicate records. 

Arguably, this approach is unsustainable. A common scenario is that each time a duplicate is noticed by one of the users, the issue is reported to the Salesforce admin who must then confirm that these are indeed duplicates and then create a rule to catch this and other dupes that meet this criteria. This process can keep repeating over and over again, resulting in a great deal of time (and of course money) spent on this issue. 

We also need to consider the complexities and the many possibilities of such matching rules. All these rules are based on string metrics that compare different fields and determine to what extent they are similar. There are many types of string metrics and if you are interested in learning more about this topic, we covered it in detail in a previous blog post, When Salesforce Records Look Like Duplicates. 

Salesforce admins will need to be well-versed in all of the matching options to select the right combination for your particular situation. It’s hard to minimize the variations that are possible, which makes selecting the best approach so difficult. This is why machine learning is a superior approach.

Machine learning does all of the above-mentioned work for you. For example, if you tell the system that the “Last Name” field is more important than the “First Name” field, could you specify by exactly how much more it is important? Is it 2 times more or 2.5? When you label two records as duplicates, the system “learns” from your choices and applies the same logic to subsequent records. This is called Active Learning, and it continuously recalculates these weights based on your actions.

The Machine Learning Approach is Much More Scalable

Bulk deduplication needs to be efficient and timely, but the standard process is too time-consuming. To demonstrate this, if you already have 100,000 records in your Salesforce environment and need to import another 5,000 records. With the rule-based approach, the system would need to compare all of the imported records with existing ones (that is 500,000,000 comparisons). Even if your system can perform 10,000 comparisons per second, it would still take almost 14 hours to dedupe the entire dataset. 

Machine learning takes a smarter approach called blocking. This is when the system only compares records that have a specific trait in common. For example, let’s say that you have data set such as this one: 

  • Jim Carey
  • Jimmy Hoffa
  • Jim Karey

All of these individuals share the same first three letters in their first name, and this could be the trait that mandates a comparison. While these traits may identify many blocks, each one consists of only a few records, which significantly reduces the number of comparisons that need to be made and makes the entire process much more efficient and scalable.

Start Enjoying the Benefits of Machine Learning Today

DataGroomr is the only Salesforce app that allows you to realize the benefits of machine learning to dedupe your data. It applies innovative technology without the complex setup process that you have with other tools. 

Try DataGroomr for yourself today with our free 14-day trial.

Steven Pogrebivsky

Steve Pogrebivsky has founded multiple successful startups and is an expert in data and content management systems with over 25 years of experience. Previously, he co-founded and was the CEO of MetaVis Technologies, which built tools for Microsoft Office 365, Salesforce and other cloud-based information systems. MetaVis was acquired by Metalogix in 2015. Before MetaVis, Steve founded several other technology companies, including Stelex Corporation, which provided compliance and technical solutions to FDA-regulated organizations. Steve holds a BS in Computer/Electrical Engineering and an MBA in Information Systems from Drexel University.