Data: As Essential as Water
Much like the role water plays for living organisms, data can be an essential fuel for your business, but it must be collected, refined, and transported before it can deliver this value.
Data is powerful. And it's safe to say that data is essential to driving innovation, deriving insights, and optimizing product processes. But just because data and information exist doesn’t make them powerful on their own—at the end of the day, data means nothing if it is incomplete, not correlatable, or not reproducible. Without intentionality behind the quality of your data, you can set yourself at a severe disadvantage through misleading conclusions.
Before we can derive actionable insights, data must be trusted and reproducible. We must be intentional about the quality of the data we obtain and how we obtain it. Only then can we take full advantage of the data we’re gathering to deliver high value insights. So, how can we get to this point?
When we talk about using data as a driver of decision making, we lean on our understanding of the five vectors of data: ontology, format, interface, integrity, and harmonization. The first three vectors—ontology, format, and interface—are focused on data standardization and accessibility. Fortunately, we can address data standardization and accessibility through several tools and solutions, such as NI’s SystemLink™ software, which is specifically geared toward enabling access to crucial information and test and measurement data analysis.
The other two vectors—integrity and harmonization—are focused on the quality of the data and information that gets collected. These are areas where there is a significant opportunity to reexamine the way we think about quality of the data and metadata we’re collecting.
In terms of quality of data, there are two distinct concepts that we must consider: the data that gets collected alongside data, also referred to as metadata, and the reproducibility of the data.
Metadata
When you’re collecting data, what additional information do you usually gather to provide context to that data? Since every project and every engineer is different, and gathering metadata isn’t standardized, this varies across the board. Unfortunately, when moving a project along from design to production, there is typically only a minimal amount of metadata gathered along the way, with a lack of data about test execution strategy, instrumentation used, versions of software utilized, and environmental conditions.
However, the data initially recorded and its resulting metadata matters. Everything from the equipment you use to the order of operations can impact the results being gathered. Without context from these elements, potential problems can arise and finding a solution can be difficult.
Reproducibility
When there is a lack of standardization in the metadata we collect, underlying factors go undocumented, which can affect the ultimate results. Beyond the quality of the data that gets recorded, when metadata is captured matters as well. Thermal considerations, for example, are impactful, and this data can differ when it’s captured at the beginning versus the end of testing. So, if “Engineer A” executes a measurement in validation and “Engineer B” measures the same metric in production, they might do so in different, undocumented ways, leaving no easy way to compare their results. Without metadata supporting the ability to reproduce the test in a comparable way, issues in quality arise and finger pointing will ensue.
While more information is always helpful, it’s not always just about collecting as much information as possible. Any additional information collected should instead drive a quality, reproducible result. Determining and validating the factors that impact reproducibility of a measurement should be an active and ongoing conversation between engineering teams. Unfortunately, this kind of collaboration is often shortsightedly deprioritized.
At the heart of our data challenges lies the issue of data integrity. This is where you should ask yourself: Are you drawing the right conclusions from your data? Are you even capable of doing so based on what's available?
The data you’re leaving out can make all the difference. Rather than making assumptions and spending excessive time trying to figure out why data sets don’t correlate with each other, we need to recognize what information and context we’re leaving out of the bigger picture to prevent the issue in the first place.
Let’s look at the way data is gathered from mouse position trackers, including those used by companies like Amazon. For example, say that every time you log into Amazon from an Apple device, the recorded cursor position is off by a certain amount. The user may not notice, but the actions misaligning from their intent may disrupt Amazon’s algorithm data. Now, if Amazon didn’t log whether a user was using an Apple or Android device, they would lose out on that valuable insight with no way to understand the discrepancies in their data.
While there are many factors that influence the output, it’s easy to assume that the results of a measurement, under relatively similar conditions, will always be the same. It’s also a common mistake to omit meta-information that we believe to be “assumed” from the data we gather. For instance, while Production Joe, our hypothetical engineer, gathers his data and is not intentional on what metadata gets taken along the way, he may not record what instrumentation was used, the software that automated it, or the environmental factors. Why? He believes that that information is assumed to be consistent or inconsequential.
Not recording metadata like instrumentation could cause issues in the future, especially when a quality engineer, who may have different instrumentation that allows for more detailed testing, references that same data set and makes assumptions of their own.
The data you’re leaving out can make all the difference. Rather than making assumptions and spending excessive time trying to figure out why data sets don’t correlate with each other, we need to recognize what information and context we’re leaving out of the bigger picture to prevent the issue in the first place.
With increasing time-to-market pressures, many test engineers looking to expedite their workflows have turned to advanced automation techniques. However, correlation issues can occur when you go from traditional instrumentation to advanced computer based PXI instruments due to a variety of factors including reduced execution times.
An engineer in design may take 22 minutes to run their test manually, whereas it may take an automation test engineer just 2 minutes to run the same test through automation. Through advanced computer automated PXI instrumentation, the shift from 22 minutes to 2 minutes now may further reduce execution from 2 minutes down to 12 seconds. While this is the same measurement, these processes look very different and will have significantly different thermal profiles. The discord that will come from not being able to easily correlate data sets can be frustrating and a significant waste of time and resources.
Attention to metadata and reproducibility is crucial; it’s not enough to simply record data. We must be cognizant of the information we’re including alongside our data sets to reduce friction throughout the entire product lifecycle and deliver new, impactful insights.
Equally as important, we also must set ourselves up for the ability to reproduce results through regular, systemic correlation exercises—in fact, this effort should be demanded to pave the way for engineers to do things differently, incorporate automation, and use new instrumentation along the way to benefit the company and the future of innovation.
So, how do we get to a place where we’re gathering quality data and deriving actionable insights? There must be investments made across the board to identify data strategies, incorporate the right tools to improve testing processes, and deliver correlatable data.
No two engineers nor two projects are the same, so changing the way we think about test and data will look different for everyone. But there are strategies we can collectively develop together, including standardizing data collection, providing the best tools and resources, and facilitating conversations between teams, to remove barriers and discord.
Investing resources in digital transformation and securing better resources for quality data is the ultimate investment your organization can make. Ignoring the full benefits of data is a detriment to your bottom line.
Gathering extra metadata makes it challenging for a human to digest and draw correlations between all the things that are influencing a test. But quality metadata and access to low-level data will enable us to debug smarter, and better data leads to faster debugging.
Quality metadata and access to low-level data can also be the key to enabling mediums like machine learning (ML), which can scan large amounts of data to identify trends that lead to errors and quality issues, as well as provide unrealized potential all the way back into design. Leveraging ML could also redefine the development lifecycle through insights that weren’t possible before. Without these mediums, unknown factors, poor data health, and a lack of crucial information will leave teams spending too much time and energy investigating problems and limit innovation.
Investing in your data and the people behind it can be accomplished through a few key steps:
It will take a forward-thinking company that is ready for the digital transformation to intentionally prioritize quality in their data and make the investment in holistic data collection strategies and in their people. Don’t just take data; lead the charge to quality results.