Skip to main content
NewsRelease Notes

January ‘25 Release: User Dictionaries for Synonyms and Ignore Words     

By February 2, 2025No Comments

Welcome to the first full release of 2025.  The focus of this release is to continue to push out some big improvements to our matching models. The big one is the addition of configurable Dictionaries, in addition to a few other minor improvements.  

Dictionaries 

One of the challenges of traditional duplicate detection is identifying duplicates when terminology variations exist.  This happens with a variety of data values, such as job titles (CEO and Chief Executive Officer), people names (“Elizabeth” and “Beth” or “Charles” and “Chuck”), geographical locations (“New York, NY” and “NYC”) and so forth. Another challenge is when extra words or characters are used.  A good example is with suffixes such as “Inc” “LLC,” “GmbH,”etc.. 

While a modern AI and machine learning approach handles it gracefully, classical matching models that rely on text similarities often struggle. To address this challenge, we are adding the ability for users to define a Dictionary, a list of common words that have the same meaning. This new feature includes two out-of-the box dictionaries, Company Suffixes and Western Names, with additional ones to follow.

Dictionaries screenshot

You, of course, have the option to create your own Dictionaries by using the Add Dictionary button or by cloning an existing dictionary and enhancing it. 

dictionary screenshot 2

Using Dictionaries with Matching Models 

Dictionaries can be used in Machine Learning or Classic Models and assigned as either a Synonym or Ignore Words fields.  The GIF below demonstrates how these can be added. 

Using Dictionaries with Matching Models

Editing Field sets in Matching Models 

When creating a model, users now have an option to add fields from DataGroomr standard models, or Salesforce matching rules. 

Editing Field sets in Matching Models

This enhancement helps with migration of existing rules to DataGroomr matching. 

Coming Soon… 

As always, we are hard at work on the next set of useful features including data enrichment, new data quality reports and additional options for matching models.  Please follow our release notes for additional information or feel free to reach out to us directly if you have ideas on any new features we should build. Thanks for tuning in! 

Ben Novoselsky

Ben Novoselsky, DataGroomr CTO, is a hands-on software architect involved in the design and implementation of distributed systems, with over 19 years of experience. He is the author of multiple publications about the design of the distributed databases. Ben holds a Ph.D. in Computer Science from St. Petersburg State University.