OmniSci is the pioneer of a new class of analytics called Extreme Analytics - designed for use cases beyond the technical limits of mainstream analytics tools.
Most people are familiar with ‘mainstream analytics tools’. They consist of the common Business Intelligence and Data Visualization solutions, as well as analytics tools for Geographic Information Systems. These are feature-rich tools, primarily designed to provide self-service reporting dashboards, drill-down, and visualization capabilities to a lot of workers. They typically rely on underlying processing technologies and require complex, expensive system architectures and data pipelines to support them.
In contrast, Extreme Analytics refers to a growing array of use cases that require two fundamental capabilities, around handling big data and delivering a radically new interactive analytics experience:
Big Data: large volumes, high-velocity and new types of data that organizations are managing; and
Interactive Experience: Delivering an agile and interactive (zero latency) analytics experience- needed by data engineers, analysts, and data scientists.
When it comes to making use of the explosion of data in the world, Extreme Analytics use cases have a combination (i.e. not all three need to be present, but often are) of three fundamental data attributes:
A very large volume of structured data. Most often we see tables ranging from tens of millions to the tens of billions of rows (although we also work with some organizations with single tables in the hundreds of billions of records.)
High velocity data streams are being generated from the explosion in data from IoT sensors, clickstream data, server logs, transactions, and moving objects, like mobile devices, cars, trucks, aircraft, satellites, and ships. Often this data is pouring in at the millions of records a second.
At least 80% of data records created today contain location-time (or spatiotemporal) data. This represents a big challenge to mainstream tools, because quickly analyzing granular-level spatiotemporal data is incredibly compute intensive, and lends itself, poorly, to traditional indexing and pre-aggregation techniques. All mainstream BI and GIS-analytics systems fail to cope with spatiotemporal datasets above relatively low volumes of data.
The second part of the Extreme Analytics equation deals with the analytics experience. Firstly, how agile is an organization’s workflow at getting large volumes of data from sources to the analytics engine? Secondly, how effortlessly can an analyst build dashboards and interactively explore the data? Together, the two factors broadly define the “time-to-insight” of an analytics platform, or how long it takes to get from raw uningested data to being able to generate insights from that data. Again, this typically isn’t an issue with small volumes of data, but becomes a huge issue in Big Data settings.
Traditionally, organizations expend huge amounts of money and time wrangling big datasets to get good data from their sources all the way through to the eyeballs of an analyst. With mainstream systems, based on traditional CPU architectures, this involves very large hardware footprints (often up to thousands of machines), due to the low parallelism of this architecture. Next, they still need to wrangle data into a form that can be queried in a (potentially) performant way. This involves data engineers doing tasks such as downsampling, indexing, and pre-aggregating (often called “cubing”) data. This low-value work is becoming a major cost in IT departments and furthermore, the techniques are often inappropriate for many of the Extreme Analytics use cases. For example, downsampling and pre-aggregation is antithetical to the idea of finding an individual record that an analyst might be concerned about, like a rogue object within a network.
With Extreme Analytics, the organization avoids this low-value human wrangling effort by ingesting the entire dataset to the system. Such an approach is viable due to the supercomputing level of parallelism provided by the system’s use of GPUs, which means queries can be evaluated in real-time without relying on ingest-slowing pre-computation.
Mainstream analytics tools typically provide a ‘click and wait’ experience for analysts, regardless of data volumes (the wait period can range from seconds to hours, depending on the dataset size and query complexity). While feature rich, these tools are simply not designed with high performance in mind, so Big Data analysts find them unsatisfactory for insights discovery, ultimately using them for reporting and interesting visualizations, rather than true analytical exploration.
In contrast to mainstream analytics, Extreme Analytics use cases require analysts to perform ‘speed-of-thought’ exploratory analysis. Often these use cases are considered mission critical, and any discernible latency in returning query results can dramatically impinge the ability to explore the data and find ‘needle in the haystack’ insights. That latency threshold is in the low hundreds of milliseconds, even on datasets in the tens of billions of records (although needs to be even faster when each chart of an entire cross-filtered dashboard needs to update when a filter is applied). This not only allows speed-of-thought analysis but also allows people in meetings to have ‘conversational interactivity’ with the data.
In their seminal work “The Effects of Interactive Latency on Exploratory Visual Analytics”, Zhicheng and Heer concluded:
“In this research, we have found that interactive latency can play an important role in shaping user behavior and impacts the outcomes of exploratory visual analysis. Delays of 500ms incurred significant costs, decreasing user activity and data set coverage while reducing rates of observation, generalization and hypothesis.”
Read the full paper here: https://idl.cs.washington.edu/files/2014-Latency-InfoVis.pdf).
There are dozens of Extreme Analytics use cases within industries such as Telecommunications; Financial Services; Automotive, Logistics; Oil and Gas; Utilities; Advertising; Defense; and Intelligence. Examples include:
Learn more about how OmniSci addresses these and other use cases here
The trends that got us here are not going away. Data continues to grow at 40% year over year. Competitiveness in virtually every industry has become dramatically impacted by analytics capabilities, the ultimate goal being to find more insights faster than the competition. Additionally, organizations are fighting a talent war to attract and retain analysts and data scientists, and realize the need to equip them with technologies that deliver exceptional productivity.
Therefore, we believe that over the medium to long-term the capabilities that we define today as Extreme Analytics will eventually become mainstream, fundamentally transforming how analytics work is done in any organization.