Accelerated Analytics Explained
A New Frontier for Accelerated Analytics
OmniSci is the pioneer in accelerated analytics, enabling businesses and government to rapidly find insights in data beyond the limits of mainstream analytics tools.
How is Acceleration Used in Analytics?
Most people are familiar with “mainstream analytics tools.” They consist of the common Business Intelligence (BI) and Data Visualization solutions, as well as analytics tools for Geographic Information Systems (GIS). These are feature-rich tools, primarily designed to provide self-service reporting dashboards, drill-down, and visualization capabilities to a lot of workers. Yet, these mainstream analytics tools rely heavily on underlying processing technologies which require complex, expensive system architectures and data pipelines to support them.
What is Accelerated Analytics?
In contrast, accelerated analytics refers to a growing array of use cases that require two fundamental capabilities, around handling big data using GPU and CPU acceleration in a more cost-effective way to deliver a radically new interactive analytics experience:
- Big Data: large volumes, high-velocity and new types of data that organizations increasingly rely on for better decision making.
- Interactive Experience: Delivering an agile and interactive (zero latency) analytics experience needed by data engineers, big data analysts, and data scientists.
When it comes to making use of the explosion of data in the world - such as IOT data - accelerated analytics use cases have a combination (i.e. not all three need to be present, but often are) of three fundamental data attributes:
A very large volume of structured data. Most often we see tables ranging from tens of millions to the tens of billions of rows (although we also work with some organizations with single tables in the hundreds of billions of records).
High-velocity data streams are being generated from the explosion in data from IoT sensors, clickstream data, server logs, transactions, and telematics data generated from moving objects, like mobile devices, cars, trucks, aircraft, satellites, and ships. Often this data is pouring in at millions of records a second.
Location and Time (Spatiotemporal) Data
At least 80% of data records created today contain location-time (or spatiotemporal) data. This represents a big challenge to outdated, mainstream analytics tools, because quickly analyzing granular-level spatiotemporal data is incredibly compute intensive, and lends itself, poorly, to traditional indexing and pre-aggregation techniques. All mainstream BI and GIS-analytics systems fail to cope with spatiotemporal datasets above relatively low volumes of data.
The second part of the accelerated analytics equation deals with the analytics experience. Firstly, how agile is an organization’s workflow at getting large volumes of data from sources to the analytics engine? Secondly, how effortlessly can an analyst build dashboards and interactively explore the data? Together, the two factors broadly define the “time-to-insight” of an analytics platform, or how long it takes to get from raw uningested data to being able to generate insights from that data. Again, this typically isn’t an issue with small volumes of data but becomes a huge issue in big data scenarios.
An Agile Data Pipeline
Traditionally, organizations expend huge amounts of money and time wrangling big datasets to get good data from their sources all the way through to the eyeballs of an analyst. With mainstream systems, based on outdated and siloed architectures, this involves very large hardware footprints (often up to thousands of machines), due to the low parallelism of this architecture. Next, they still need to wrangle data into a form that can be queried in a (potentially) performant way. This involves data engineers doing tasks such as downsampling, indexing, and pre-aggregating (often called “cubing”) data. This low-value work is becoming a major cost in IT departments and furthermore, the techniques are often inappropriate for many big data use cases. For example, downsampling and pre-aggregation is antithetical to the idea of finding an individual record that an analyst might be concerned about, like a rogue object within a network.
With accelerated analytics applications, the organization avoids this low-value human wrangling effort by ingesting the entire dataset to the system. Such an approach is viable due to the supercomputing level of parallelism provided by the system’s novel use of GPUs and CPUs, which means queries can be evaluated in real-time analytics dashboards without relying on ingest-slowing pre-computation.
Mainstream analytics tools typically provide a “click-and-wait” experience for analysts, regardless of data volumes (the wait period can range from seconds to hours, depending on the dataset size and query complexity). While feature rich, these tools are simply not designed with high performance in mind, so Big Data analysts find them unsatisfactory for insights discovery, ultimately using them for reporting and interesting visualizations, rather than true analytical exploration.
In contrast to mainstream analytics solutions, accelerated analytics use cases require analysts to perform “speed-of-thought” exploratory analysis. Often these use cases are considered mission critical, and any discernible latency in returning query results can dramatically impinge the ability to explore the data and find any “needle-in-a-haystack” insights. That latency threshold is in the low hundreds of milliseconds, even on datasets in the tens of billions of records. This not only allows speed-of-thought analysis but also allows people in meetings to have “conversational interactivity” with the data but only with the right accelerated analytics software / analytical database.
In their seminal work “The Effects of Interactive Latency on Exploratory Visual Analytics,” Zhicheng and Heer concluded:
“In this research, we have found that interactive latency can play an important role in shaping user behavior and impacts the outcomes of exploratory visual analysis. Delays of 500ms incurred significant costs, decreasing user activity and data set coverage while reducing rates of observation, generalization and hypothesis.”
Read the full paper here: https://idl.cs.washington.edu/files/2014-Latency-InfoVis.pdf.
Types of Accelerated Analytics Use Cases
There are dozens of accelerated analytics use cases within industries such as Telecommunications; Financial Services; Automotive; Logistics; Oil & Gas; Utilities; Advertising; Defense & Intelligence and more. Examples include:
- Telecommunications: Network Reliability Analysis
- Oil & Gas: Acquisition & Divestment
- Automotive: Understand Driver Behavior Data
- Investment Management Alternative Data Analysis
- Utilities: Smart Meter Data Analysis
- Pharmaceuticals: Clinical Trial Analysis
- Geospatial Intelligence in Federal
- GPU-Accelerated Defense Analytics
- GPU-Accelerated Public Sector Analytics
Learn more about how OmniSci addresses these and other accelerated analytics use cases here.
The Medium to Long Term Outlook for Accelerated Analytics Applications
The trends that got us here are not going away. Data continues to grow at 40% year over year. Competitiveness in virtually every industry has become dramatically impacted by analytics capabilities, the ultimate goal being to find more insights faster than the competition. Additionally, organizations are fighting a talent war to attract and retain analysts and data scientists, and realize the need to equip them with technologies that deliver exceptional productivity.
Therefore, we believe that over the medium to long-term the capabilities that we define today as accelerated analytics will eventually become mainstream, fundamentally transforming how analytics work is done in any organization.