Data CleansingMachine Learning

Assigning Roles and Responsibilities in Data Quality Assessments

By November 18, 2020 No Comments

Part 4: Conducting a Data Quality Assessment

We are up to the fourth article in the series on Data Quality Assessments. In this part, we will take a closer look at the roles and responsibilities for each individual involved in the process. If you recall, the last article (Part 3) covered a variety of tools that may be useful for organizations as they measure different aspects of data quality. The next logical question is who will be using these tools and for what purposes? We will detail the personnel and expertise that is likely necessary to collect and collate the information and expand on their areas of responsibility. 

For background information, please refer to Part 2: Key Data Quality Metrics You Should Be Tracking and Part 1: Determining the Purpose and Outlining Goals of a Data Quality Assessment of this series.

The roles and responsibilities we will examine pertain to management roles related to governance, planning, definition, capture, usage, and access to data or information. Each of the roles covered in this article will participate in key processes surrounding data quality assessment, such as establishing and tracking data usability metrics, timeliness of data, and other aspects we talked about in previous installments of this series. Depending on the context of the audits, the roles may cover broad areas, such as both data and information management,  while others may be more focused solely on a specific topic, such as quality. We have also intentionally included resources from business since the ultimate purpose is to improve operations that are used by these users. Let’s begin our review with the role of Data Executive…

Data Executive 

The reason we use a generic title of “Data Executive,” rather than the more traditional “Chief Technical Officer (CTO)” or “Chief Information Officer (CIO),” is because every organization has an individual responsible for data, regardless of the name. Whoever has this role will be responsible for setting policy, providing support, and sponsoring processes around data governance. This role is critical for performing data assessments because you will need to get executive buy-in for your data governance and quality initiatives. The best way to do this is by linking organizational objectives to tangible challenges that can only be fixed by getting a data quality and governance framework up and running. For example, when you transition from a legacy system to a cloud-based one, which is very common today, the data from the legacy system should not just be migrated to the cloud. It should be verified for accuracy and efficacy and then cleansed of duplicates and erroneous data. For systems such as CRM or ERP, this can only be done with the support of a Data Executive.

Therefore, this role should be to affirm C-level commitment to ensuring data quality, defining the scope of the data quality assessment, and creating a roadmap for its implementation. 

Data Owner 

The data owner is a senior stakeholder inside your organization who has the authority to make decisions about data requirements, data quality, and miscellaneous technical elements such as storage constraints. In previous articles, we talked extensively about data completeness, but the exact criteria for “complete data” will vary from one organization to another. This is where the data owner will step in and provide the exact definition that fits your needs. They should also be empowered to make necessary changes and have the required budget and resources available to undertake any data cleansing initiatives. 

An obvious question that comes up with such broad responsibilities is whether a senior manager will have the time and experience to fully understand the details of your data.  Furthermore, will this individual have the capacity and time required to perform any of the data quality assessment processes? This is a fair point and where the role of the data steward comes in. 

Data Steward 

The data stewards can come from a variety of different backgrounds or departments, not just from IT. They define, implement, and enforce accountability and responsibility for all data stakeholders. As part of the data quality assessment, they would research and propose remedial actions to resolve any of the issues uncovered during the process. The next position we will look at will be database administrators and it would be useful to understand how their responsibilities contrast with the data stewards. The role of the data stewards is more broad and encompasses policies, procedures, and data quality. Data stewards are responsible for managing the overall value and long-term sustainability of your data, versus the actual data itself.

Database Administrators 

Database administrators take functional responsibilities for data repositories and siloes. When you are conducting the data quality assessment, they will be responsible for implementing the collection of necessary data, cleansing, and finally the reporting of the overall health of your systems. They have a very difficult job because data cleansing can be a long and arduous process, especially if your organization has not paid much attention to data quality in the past. A common example that is near and dear to our heart is the issue of duplicate data. There are so many ways duplicates can manifest themselves besides common scenarios such as two records having the same email address or name. Many duplicates are identified only with “fuzzy” matching criteria, where the data is similar but nor exactly the same. Imagine the complexity of creating database rules that catch every possible variation of such “fuzzy” duplicates. This is a very time-consuming, expensive, and, arguably, an impossible task to accomplish without special tools. 

This is why you should consider a deduplication app like DataGroomr to cleanse your data. It leverages machine learning to identify duplicates without the need to create special rules. A nice bonus is that DataGroomr provides an almost instant data quality assessment for Salesforce data (also included as part of a 14-day trial at no cost). DataGroomr will remain useful for your organization even after the data quality assessment is completed, since it continuously scans and optionally removes any duplicates it finds. It also prevents duplicates from imported data, which is a common gateway for these issues being introduced into Salesforce.

In addition to database administrators, you may want to consult or even include other data custodians.  This includes anyone responsible for data modeling, architecture, backup and archiving or preventing data loss and corruption.

Data Stakeholders 

A statement like “all of your employees are data stakeholders” is often considered a hyperbole, but it is fair to say that in one way or another they are all impacted by the quality of your data. Of course, everyone cannot participate in the data quality assessment, so we have to identify our primary stakeholders. These are the individuals who are directly affected by the data. Typically they come from sales, marketing, customer service teams, or other areas where data is created, managed, or consumed. Data stewards are well positioned to identify these stakeholders. They are usually in direct contact with these individuals and understand who offers perspectives and insights into problems that technology might overlook. 

As a sidebar in one of our previous articles, we mentioned how important it is for sales and marketing professionals, or others working in customer-facing roles, to be confident in the data available to them. Having these individuals participate as stakeholders is a great opportunity to get them directly involved in the process and also to provide assurance that the organization realizes the importance of clean data, especially as it pertains to customer acquisition and retention. 

Implement the Right Data Governance Practices

We briefly touched on data governance during our discussion around the Data Executive role and would like to expand on this responsibility due to its importance. When performing a data quality assessment, it can be challenging for everyone to effectively communicate their own ideas, goals, pain points, risks, etc.; different stakeholders will have their own priorities. This is where data governance needs to guide the process and lay out the policies, standards, and elements around which Key Performance Indicators (KPI) for the data assessment are determined. Overall, data governance should determine the scope of the effort, and what may be just as important are which aspects will be excluded from the assessment.

Trust DataGroomr with Your Data Cleansing Needs

Regardless of the duplicate rules, you have in your organization, DataGroomr will find the duplicates that match your specific criteria. DataGroomr’s algorithms use machine learning to identify the duplicates and are customizable to fit your needs. This is a significant upgrade over traditional rule-based tools since, with DataGroomr, there is no need to set up any rules. You can just install it and start deduping. 

As we mentioned above, we offer a free data quality assessment as part of our free 14-day trial. DataGroomr will help you clean up your data, make it easier for your sales team to work with the data in Salesforce, and reduce the workload of your Salesforce admins as they will no longer need to waste time creating rules to catch every possible fuzzy duplicate!

Steven Pogrebivsky

About Steven Pogrebivsky

Steve Pogrebivsky has founded multiple successful startups and is an expert in data and content management systems with over 25 years of experience. Previously, he co-founded and was the CEO of MetaVis Technologies, which built tools for Microsoft Office 365, Salesforce and other cloud-based information systems. MetaVis was acquired by Metalogix in 2015. Before MetaVis, Steve founded several other technology companies, including Stelex Corporation which provided compliance and technical solutions to FDA-regulated organizations.