Why Data Accuracy Matters
How much damage can inaccurate data really do? In 1999 the NASA Mars Climate Orbiter lost communication with mission control as it approached its destination planet. The reason? It burned up in the planet’s atmosphere due to a miscalculation caused by inaccurate data. The Orbiter’s thrusters were controlled by two separate software programs. One of thruster’s software measured the amount of force needed to reach an appropriate altitude in pounds (lbs), while the other thruster interpreted its data as being presented in Newtons (N). The Orbiter ended up costing NASA $125 million.
That is an extreme example to be sure but a necessary one to appropriately address the problems that are caused by poor Data Accuracy. Due to the amount of data that is available these days, especially in regard to web data, issues posed by large amounts of inaccurate data should be of paramount concern. This is particularly vital in use cases involving Artificial Intelligence (AI) and Machine Learning (ML) since AI and ML systems must be fed structured and labeled data to function properly. In order to improve data accuracy, companies should begin by establishing a deeper understanding of what exactly “data accuracy” means, as well as how it affects their business.
Read More: Data Breaches by the Numbers
Data accuracy can be broken down into two obvious terms – data and accuracy. Data is defined as “information, especially facts or numbers, collected to be examined and considered and used to help decision-making or information in an electronic form that can be stored and used by a computer.” This definition mostly applies to records of historical events that are stored on digital media that computers can access, and business users can ultimately harness the insights gathered from this information to uncover strategic business insights. Most people, whether or not they know the exact definition of the word, have at least a general grasp of what “data” means. The meaning of the second term, “accuracy,” in this context is less apparent.
For data to be “accurate” it must meet two criteria: form and content. Form means that the data must adhere to a standard format. This prevents confusion and ensures there is no ambiguity in regards to the meaning of the content of the data when it is analyzed by a computer. Content here refers to the information contained in the data – the message the data is getting across. The many different ways that a single date can be written are great examples of how form can affect the analysis of data. March 2, 2019, can be written as 3/2/2019 or 2/3/2019, but the different forms the date can take present an entirely different meaning depending on who is reading the data. An American would likely interpret the first format correctly as March 2, 2019, but a European might read the same data format as representing February 3, 2019, if they weren’t aware of the data source.
In order for the data to be properly analyzed, the content must be consistent. If the content is not consistent, it prevents a computer from being able to group and summarize information by recognizing similarities in the available data. For example, city names can be abbreviated in many different ways. “New York City” can be written as “NYC;” “New York;” or “NY, NY.” Due to the discrepancy between the abbreviations, a computer would be unable to determine that the content presented by each different piece of data is the same. The computer’s inability to identify the shared content prevents it from being able to group and summarize the data accurately. Grouping and summarizing data is a necessary feature to analyze data for many different purposes, so consistent form and content are essential to enable accurate data to use.
Simply meeting these two criteria, form and content, does not mean that the data is ready for analysis to reveal key business insights and inform strategies and decisions. Since data is meant to be a representation of real-world information, it must also be factually correct in order to accurately represent reality. Making sure the data is reliable and accurate is a requirement if useful business insights are to be gleaned from the information.
Businesses can look to improve their processes in a number of different areas with accurate data gathered from a number of sources, including web data. Reliable and cleansed data support decisions regarding sales strategy, providing increased sales revenue. Accurate data can also help prevent wasted time and money spent on ineffective initiatives and tactics such as sending mailers to defunct addresses or governing inaccurate data to ensure the data is analyzed correctly. Businesses will also increase customer satisfaction when the marketing team uses accurate customer data to deliver messaging that will motivate potential buyers to make the purchase. Finally, high data accuracy will ultimately improve the business’ ROI in data assets by reducing the time spent managing and formatting data, in addition to increasing revenue earned by the marketing and sales teams.
While data accuracy is a stable foundation for many different forms of business analytics, there is a price to be paid to ensure reliable, up-to-date and accurate data for these analyses. Businesses must institute a strict and persistent culture of data governance starting with the executive team and continuing all the way down the company’s hierarchy of employees. This will guarantee that all levels of the organization play a part in ensuring the availability of highly accurate data, as everyone in the company will be in lockstep on data form and content standards. This culture of data governance not only enables reliable business analyses but also increases stakeholder acceptance of the data.
If stakeholders and end-users do not trust and accept the data they will never accept the insights gained through analyses. According to a KPMG report, only 45% of surveyed data and analytics decision-makers “consistently use rigorous quality checks to ensure the accuracy of data and analytics models and outputs.” In addition, the same study reports that 60% of organizations are not confident in their data and analytics. There is an obvious trust issue when it comes to executives’ willingness to use data as an asset to generate business strategies. Building trust in data will require “data accuracy” to be one of the cornerstones of a company’s culture of data governance. Businesses can implement an effective data governance plan by ensuring the following:
- Quality – is the data itself trustworthy to build useful analytics?
- Effectiveness – do the analytics work as intended?
- Integrity – are the analytics being used in an acceptable way?
- Resilience – are long-term operations optimized?
Data accuracy becomes increasingly imperative as the ability of companies to collect and analyze huge amounts of data from the world wide web increases and more and more businesses adopt Artificial Intelligence (AI) and Machine Learning (ML) as a means to pursue business strategies. An MIT/Google joint study found that 60% of companies surveyed implemented ML initiatives, 50% are using ML to better understand customers, and 48% expect ML to make their organization more competitive. Data accuracy must be a priority if these companies hope to gain useful, real-world insights from their AI and ML initiatives. AI and ML technology use algorithms to analyze data and make predictions based on that information. This puts a premium on accurate data, as the purpose of AI is to reduce the need for human intervention. Although reports indicate that AI programs can be at least 95% accurate on a regular basis, AI programs cannot determine whether or not the data being analyzed is accurate. If the data is inaccurate the insights gained from the AI may be flawed or incomplete, and could negatively affect customer relationships, competitiveness, and revenue growth.
The industry is abuzz with fascinating successes of new-age technology like AI, customer relationship management, supply chain management, digital marketing and more. But what is often forgotten and/or misunderstood is the level of data governance needed to provide accurate data and inform these technologies. Building trust in data requires an enterprise-wide commitment to valuing data as an asset – and effectively managing that asset daily at all levels of data entry and management within the organization.