January ‘25 Release: User Dictionaries for Synonyms and Ignore Words

Welcome to the first full release of 2025. The focus of this release is to continue to push out some big improvements to our matching models. The big one is the addition of configurable Dictionaries, in addition to a few other minor improvements.

Dictionaries

One of the challenges of traditional duplicate detection is identifying duplicates when terminology variations exist. This happens with a variety of data values, such as job titles (CEO and Chief Executive Officer), people names (“Elizabeth” and “Beth” or “Charles” and “Chuck”), geographical locations (“New York, NY” and “NYC”) and so forth. Another challenge is when extra words or characters are used. A good example is with suffixes such as “Inc” “LLC,” “GmbH,”etc..

While a modern AI and machine learning approach handles it gracefully, classical matching models that rely on text similarities often struggle. To address this challenge, we are adding the ability for users to define a Dictionary, a list of common words that have the same meaning. This new feature includes two out-of-the box dictionaries, Company Suffixes and Western Names, with additional ones to follow.

You, of course, have the option to create your own Dictionaries by using the Add Dictionary button or by cloning an existing dictionary and enhancing it.

Using Dictionaries with Matching Models

Dictionaries can be used in Machine Learning or Classic Models and assigned as either a Synonym or Ignore Words fields. The GIF below demonstrates how these can be added.

Editing Field sets in Matching Models

When creating a model, users now have an option to add fields from DataGroomr standard models, or Salesforce matching rules.

This enhancement helps with migration of existing rules to DataGroomr matching.

Coming Soon…

As always, we are hard at work on the next set of useful features including data enrichment, new data quality reports and additional options for matching models. Please follow our release notes for additional information or feel free to reach out to us directly if you have ideas on any new features we should build. Thanks for tuning in!