Part 5: Conducting a Data Quality Assessment
Part 5 of our series on Data Quality Assessment focuses on results and reporting. This article concludes our five-part series and if you missed part four, which focused on assigning roles and responsibilities, you can read it here. For background information, please refer to Part 3: Choosing the Right Data Quality Tools, Part 2: Key Data Quality Metrics You Should Be Tracking and Part 1: Determining the Purpose and Outlining Goals of a Data Quality Assessment of this series.
Why is Reporting So Important?
Before we get into the specifics of what should be included in your data quality report, it is important to understand why this stage is important. Very often, the process of assessing your data quality is so long that the reporting of results is an afterthought that is not given the required amount of attention. Remember that the report which you create will provide your top management with valuable insights into the health of your data and will be used as a basis for recommending remedial actions to improve data quality. If you are like most organizations, you will likely uncover many issues with your data, including (but certainly not limited to) duplicates, undefined and missing data … and a host of other problems. It is important that all these findings are not swept under the rug in hopes that they will go away by themselves. Detailed discussions and actionable solutions are needed to correct the situation.
In many cases, organizations are not satisfied with the findings or results of the data quality assessment, even if they do provide an accurate perspective and insight into the prevalent issues. Sometimes, assessments are redone many times to confirm the findings. If this is the case, it would be helpful to have a report that details your approach, methodology, and other important aspects of how you went about conducting the assessment. We will look at all of these in the subsequent parts of this article but let us start off with the executive summary.
The executive summary should summarize the key points of the assessment. You should restate the objectives, highlight major elements of how data was gathered, and briefly touch on results, conclusions, or recommendations from the report. In general, the executive summary should include enough information so the reader can understand what will be discussed in the full report, without having to read it in its entirety. Although this will be the very first section of the report, we would advise writing it at the very end, after the other sections have been completed. This will provide a perspective over the main points that should be highlighted for those that will only read this section.
You may be wondering “Why do I need an introduction if I already have an executive summary?” The answer to this question is that these two sections serve different purposes. The introduction merely sets the scene. Think of the introduction as the first ten minutes of a movie script in which you find out what the rest of the movie will be about, whereas the executive summary is the entire movie script condensed into a couple of paragraphs. So, what should be included in the introduction? Start with the purpose. Outline the business problems you are experiencing and what you are trying to accomplish by conducting the assessment. You may also provide additional context into the timeframe during which the assessment was conducted and the budget you were working with.
Purpose, Scope, and Objective
Even though you mentioned the purpose of the assessment in the introduction, you should try to expand on it and explain how the assessment should be viewed. For example, did you conduct a full-fledged evaluation or just a partial internal assessment? Then move on to the scope of the assessment. Is this an end-to-end assessment or are you just focused on certain characteristics or time periods? Finally, when you are ready to write the objectives, be as specific as possible. Avoid generic language such as, “To cleanse data” or “Improve the health of the data.” It would be better to be more specific and detailed. For example, if you are trying to eliminate duplicates, write something like “To identify and reduce the amount of duplicate data.”
Identify Your Audience
In this section, you will answer the question “Who is the intended reader?” If you recall, in last week’s post, we reviewed the resources involved in conducting a data quality assessment from both the business and technical side. We also emphasized that anyone who relies on data and its quality to perform their jobs is a potential stakeholder in the assessment. So, it would not be a stretch to say that everyone in the organization (and perhaps even outside partners) are a potential audience for this report. Under this consideration, the report needs to be authored for a wide distribution and should avoid using unnecessary jargon or obscure technical terminology.
Approach & Methodology
There are many different methodologies for measuring data quality, which is why you should specify the one(s) you used. For example, you could be selecting specific attributes like an email address and verifying them against reliably sourced external information. Alternatively, you may be trying to determine the level of duplication in your data. This would likely necessitate cross-referencing multiple data points against multiple repositories.
Risks & Limitations
Every data quality assessment will have constraints and almost certainly budgetary and timeframe limits, which present risks of incomplete or inaccurate results. However, there are other risks to consider, as well. Take a look at these questions for reference purposes:
- Did you look at all data collection processes or just a few?
- Did you address all the concerns of stakeholders?
- Was there a systematic procedure in place for conducting the assessment?
- Are all your definitions of data quality attributes consistent throughout the assessment?
Be sure to describe the potential risks to the company that may result from these limitations. In many cases the risks could be significant. They include missed business opportunities, reputational damage, lost revenue and many others.
Reporting the Findings
It is always a good idea to use charts or graphs to report findings. Visual evidence is easier to understand and communicate. Many tools that are used in the assessment produce graphical results. For example, we discussed DataGroomr, which provides a graphical assessment of Salesforce duplicates as part of their free 14-day trial. There is even a histogram of duplicates over time. Take a look at the other tools that we reviewed in part three of this series. Most include reporting features, or they can export data that can be compiled into a visual presentation using other tools such as Tableau or PowerBI. Once all of the data has been assembled and presented to a committee and other stakeholders, be sure to discuss whether the data quality assessment accomplished the goals and objectives that we discussed in part one.
Additional Reporting Considerations
If you are like most organizations, the state of your data quality may be … less than ideal. After all, that is why you are running an assessment. When you are presented with the true extent of the problems, the tendency is to start assigning blame to different parts of an organization. This would not only be counterproductive but would also add an additional barrier to achieving the level of data quality that is desired. In fact, there was some research done on Reporting Data Quality Assessment Results: Identifying Individual and Organizational Barriers and Solutions that focused on the obstacles to data quality reporting. It concluded that individuals are afraid of the consequences if they discover and reveal data quality issues. These types of repercussions are the reason the issue continues to perpetuate in many organizations. It is always more productive to assess your current situation and decide on a reporting strategy that does not assign blame but still identifies the sources of the problems.
This completes our five-part series on Data Quality Assessment. If you enjoyed it, please take a look at some of the other blogs on our site.