Data is variegated. As we have discussed before, there are several types of data and the infinite variety of form factors and formulations are expanding all the time. Data differs not just in its basic genus and species (so-to-speak), it also differs in terms of its intensity, regularity and size.
In a world where more data is also working inside data-intensive applications as previously explained here, we need to be able to work with more data at ever-higher levels of resolution i.e. data streams that carry a rich set of information spanning values related to images, movement, sound and other unstructured elements that we now seek to bring structure to.
Where Is High-Resolution Data?
A location that will now make specific use of high-resolution data are the universe of sensors that populate the Internet of Things. The physical world is increasingly filled with sensors, most of which collect data at an insatiable rate. In the automotive industry, for instance, cars coming off production lines contain hundreds of IoT sensors that produce and collect data to provide digital pointers that car owners (or the vehicles themselves) can act on. By using sensors to track changes over time, manufacturers in all verticals are equipping themselves with information to power advanced artificial intelligence applications.
After all, AI is only as strong as the data (high-resolution or otherwise) that powers it… so what implications does this have at the data management and database layer?
“As we now know, AI encompasses various sub-categories, such as generative AI, causal AI and something I have come to call real-world AI. While generative AI has garnered significant attention for its ability to generate new data from a given input, real-world AI focuses on practical applications tailored to real-world scenarios,” said Evan Kaplan, CEO of InfluxData, an organization known for its time series database that is purpose-built to handle metrics, events and logs all in one place.
Kaplan suggests that these “real” applications need data to address various issues, ranging from factory automation to smart thermostats. Employing diverse techniques for data analysis, models working in this space can drive predictive analytics, forecasting and anomaly detection. However, the challenge lies in the substantial volumes of high-resolution data – essentially data with a rich level of detail and clarity – required to initiate, update and train models for AI tools.
Nano-Second Precision
“Platforms tailored to manage high-resolution data at scale have emerged to meet this challenge. These platforms can handle the volume and velocity of fresh data, sometimes down to nano-second precision, required to fuel AI models. Coupled with advances in AI, this is opening the door to fully autonomous systems,” said Kaplan. “As these systems become more advanced, the need to understand and derive value from real-time or time series data – data that is recorded over consistent intervals – increases exponentially. Every Internet-connected device generates a continuous flow of time-series data. AI uses this data to analyze historical patterns, model behaviors and make predictions. This is an example of real-world AI that creates intelligence at scale via automated data collection, enabling systems to forecast outcomes, respond to them and address them effectively.”
On the universal data spectrum, time-series data is argued to enrich system intelligence by offering chronological context across various streamed data sources. This data is processed using software algorithms and machine learning to interpret signals and contextualize the real world in a meaningful way.
Using time-series data at scale means automated technologies are able to continuously enhance their intelligence as sensors encounter a growing array of real-world (sticking with the InfluxData CEO’s nomenclature) scenarios. However, since the data streams necessary for this never stop, the underlying AI systems must be constructed on a platform capable of handling high-volume, high-cardinality time series data.
50 Data Points Per Millisecond
“Imagine a sensor measuring up to 50 different data points every millisecond. Now, consider that an autonomous system could contain tens or hundreds of sensors. These sensors generate high-cardinality data (cardinality in time series data denotes the abundance of unique values over time), which increases exponentially with each passing minute,” explained Kaplan. “Specialized data platforms offer a scalable and secure environment for storing, handling and analyzing sensor data on a vast scale. With their high ingestion rates and real-time querying abilities, these platforms excel at swiftly and effectively retrieving time-based data.”
We can make all these suppositions, suggestions and definitions, but we also need to note that devices acting on real world data must understand its origin, its operational value, its mission criticality, plus also have a grasp of its intermediary and final destination. Numerous real-world AI applications integrate edge devices and cloud-based platforms. To maximize the benefits of time series data, devices must comprehend their edge devices’ resources and constraints. By addressing these challenges, time series data and AI is said to be able to deliver autonomous systems that are, in theory, increasingly more intelligent. Data will run through increasingly sophisticated learning models and serve as a foundational component.
The proliferation of connected devices and software is generating increasingly large volumes of highly granular often high-resolution data, creating specific management challenges. However, the granular nature of this data collection increases cardinality (the number of values, or number of types of value in any given piece of data) a challenge many databases struggle to handle efficiently.
Addressing High Cardinality
Consider a McLaren Formula 1 car sensor (see lead image above) capturing 50 distinct data points every millisecond. This can lead to exponential growth in high-cardinality data. Columnar databases are increasingly favored for managing this challenge. They facilitate near-real-time querying while economizing disk space. Though differing from row-based databases, the underlying technology is generally familiar to developers. Understanding data workload characteristics is key to optimizing processing efficiency.
“The substantial data output from sensors can be expensive to retain, prompting organizations to devise strategies for managing older data. Initially, data transformation is essential. For instance, considering our McLaren sensor generates 50 data points per millisecond, such granularity may not be required in the future as we start to be able to classify which information mattered… and which mattered a bit (or a whole lot) less. Consequently, organizations may opt for summarizing second-by-second analysis instead of retaining data at millisecond intervals. This approach helps mitigate storage costs by evicting unnecessary data,” said Kaplan.
The InfluxData CEO concluded his points on this subject by mentioning data compression techniques for storage efficiency. Even after transformation, organizations are left with significant volumes of time-series data. He suggests that transitioning to columnar storage can yield improved compression ratios, reducing disk space usage and enhancing query performance. Aligning the on-disk representation of data with its in-memory counterpart facilitates efficient data movement between disk and RAM, ensuring consistent query performance and cost savings.
Making Sense Of A Sensor-Driven Future
With evolving data and new inputs, AI models require continual updates to stay current and effective. Continuous adaptation to emerging scenarios is essential and this can be facilitated by regular monitoring. Additionally, consistent performance analysis is crucial to confirm proper functioning, especially with the introduction of new data.
Anticipating future outcomes through AI applied to high-resolution sensors may herald a new era in terms of the way we juggle and wrangle with data. Whether you are the proud owner of a smart refrigerator that tells you when your milk & eggs have gone off or not, sensors are now part of our lives. We may all be completely aware of and conversant with the world of sensors as yet, but it’s worth remembering that the average smartphone ships with around 15 sensors, so sensors and their high-resolution data are in our pockets already. It makes sensor sense, right?