Venkat Krishnamurthy
Jan 8, 2019

The OmniSci Platform—A Year in Review

Try HeavyIQ Conversational Analytics on 400 million tweets

Download HEAVY.AI Free, a full-featured version available for use at no cost.

GET FREE LICENSE

A lot has happened at OmniSci these last few months! We changed our name from MapD, to better reflect our mission, and followed that up with successfully closing our Series C funding. At the same time, we’ve continued to push forward on product capabilities on multiple fronts. Since our 4.0 release in June, we have had 8 (yes, eight) point releases that cover a gamut of features along with performance and stability improvements, with the next scheduled release (4.4) due out shortly.

So what is it that we’ve been working on? The theme of 4.0 was spatiotemporal capabilities, and this has been our primary focus area as we increasingly see growing adoption in areas like Vehicle Telematics Analytics. Also, as we have added several new enterprise customers and a growing community, we have also devoted a lot of work and attention to performance, stability and completeness. Last but not least, we’ve started down the really exciting path of supporting Data Science capabilities deeply within the OmniSci experience. We’ll discuss some of the highlights below. As always, we are hard at work on our roadmap for next year, including OmniSci 5.0—we have a lot of exciting capabilities that will make OmniSci the ideal platform when you need both interactivity and scale to underpin any sort of insight you wish to derive from data.

OmniSci Core

Spatiotemporal analytics continues to be a major focus area of the OmniSci platform. We added to our list of supported spatiotemporal operators and functions. From OmniSci 4.1.2 onwards, we have added support for new spatial joins—ST_Within, ST_Intersects, ST_Disjoint, ST_DWithin, and ST_DFullyWithin and a number of scalar utility functions including ST_Area, ST_Perimeter and ST_Length, for all the appropriate geometry types (POINT, LINESTRING, POLYGON and MULTIPOLYGON). In addition, we added a variant of ST_Distance, called ST_MaxDistance that allows computation of the maximum distance between POINT and LINESTRING geometry types.

Finally, we have significantly improved the performance of spatial joins—with 4.4, we implemented a more performant spatial hashing approach that gets up to 5x better performance on ST_Contains, and we will continue to improve on this in upcoming releases.

We continue to add more Enterprise-ready features, particularly around security. As of 4.4, we added support for SAML-based Single Sign-on for authentication on OmniSci core (exclusive to OmniSci Enterprise Edition). OmniSci now supports https across all the major APIs. In addition, OmniSci Core added a GRANT ALL privilege to allow for more efficient management of database object permissions.

The performance of the OmniSci platform continues to be our bedrock differentiator. Running on GPUs is the primary reason for this advantage, and as a result, we are always looking for ways to better manage GPU memory. Beginning with 4.2, the OmniSci core engine now provides (experimental) columnar output for all projection queries, significantly improving the efficiency of memory management for query results, while also setting the stage for deeper integration with the Apache Arrow ecosystem to power our Data Science efforts.

OmniSci and Data Science

The OmniSci mission is: to make analytics instant, powerful and effortless for everyone. A related goal for us has always been to eliminate the spurious distinctions between “Analytics” and “Data Science.” We view the tools and methods (from SQL to Pandas to SciKit Learn), and user personas (from Citizen Data Scientists to ML researchers) as equally deserving of the power of GPUs to derive insights from data at scale.

Shortly after our 4.0 release, we announced the support for OmniSci as a backend for Ibis, a great project that provides a pythonic (Pandas-like) API that layers on top of OmniSci (and other SQL backends). Integrating with Ibis means that you can fire up a JupyterLab notebook and have the GPU-accelerated power of OmniSci on tap, through a familiar API, and coexisting with all the other great PyData tools everyone knows and loves without having to write raw SQL.

Next, we focused on packaging OmniSci for easy set up and install via the conda package manager. That’s right—you can actually install open source OmniSci now as simply and easily on your MacBook with “conda install.” (For now, this is limited to CPU-only on Macs, but we’re working to make all of our releases available this way, and supporting GPUs on Linux).

The most exciting development building on Ibis, is our potential integration with the excellent open source Vega (and Vega-lite) ecosystem which we already build upon for our Geospatial visualization capabilities. We worked with the community on Vega-Lite SQL transforms (shout out to Dominik Moritz, Saul Shanabrook), which allows OmniSci the ability to integrate declarative charting with the power of our OmniSci Core engine.

Further leveraging the Altair project that wraps a Pythonic API on Vega/Vega-lite, we can now provide data scientists the ability to use OmniSci inside JupyterLab, without writing any SQL or needing to create a JSON spec for Vega-lite. Here’s an example of what is possible—the ability to set up a complex dataset expression with Ibis, have it execute, and render the result using VegaLite (via the Altair API), at a speed where it almost seems like it’s running locally.

Our goal in 2019 is to build on this foundation and deliver a “closed-loop” data science workflow within OmniSci. Fortunately for us, our friends at Nvidia are hard at work on RAPIDS.ai, which we look forward to integrating within the OmniSci experience and delivering GPU-accelerated Data Science and Machine Learning to all of our users. Stay tuned!

OmniSci Render

Since 4.0, OmniSci Render added support for the LINESTRING datatype (more on that later!), and allows for render buffers larger than 4GB, besides a number of performance optimizations and bug fixes, including the support of Vega transforms in our backend-rendered charts. We’re hard at work on taking OmniSci Render to the next level in 2019, and can’t wait to share this with you in the new year.

OmniSci Immerse

Whether you are a ridesharing company with a large collection of trip trajectory data, an auto/aircraft/drone manufacturer dealing with telematics data from a large installed base of connected vehicles, a fleet management company tracking vehicle movements, or an analyst tracking movements of people, there is an increasing need for analyzing and understanding patterns of movement of a set of entities through space and time. A natural representation of this data is as a ‘line’ - a set of timestamped points in a 2-d/3-d plane.

Building upon the work in OmniSci core in 4.0, to add support for the 2-D LINESTRING datatype, and OmniSci Render to support server-side rendering of large collections of this datatype, OmniSci Immerse now has a new geo chart type—the LineMap.

Here’s an example of the LineMap chart being used to render an interesting trajectory dataset used in a Kaggle competition—it has over 1.7 million taxi trips in the Portuguese city of Porto, with each trip represented by the equivalent of a multi-point LINESTRING (If you’re curious, that’s 83 million vertices, or 12 points per trip on average). As you can see, OmniSci Render allows you to fluidly zoom from the level of the city down to the level of a city block in milliseconds.

As always, the true power of OmniSci Immerse is in combining this type of user interaction with a crossfilter on additional attributes of the same data. Here’s another alternate view, where we’re now cross-filtering on each day in the dataset. You can quickly see how this already allows you to spot patterns of movement over time. This is characteristic of the kind of spatiotemporal exploration that OmniSci can allow you to do, by exploiting both the visual computing power of the GPU along with our ability to run SQL-based analyses on the underlying data in milliseconds.

Besides the LineMap, we’ve also added support for percentage or relative views in the Combination (“Combo”) chart, an often-requested feature from some of our major customers. You can now toggle from the absolute to relative percentage-view of a measure-view in the chart editor.

As always, please check out the latest release notes for a much more detailed list of additional fixes and improvements in the latest release.

The OmniSci Cloud platform is already up to date with the latest features described here—if you’ve not already done so, please sign-up for a 14-day free trial.

Please leave us feedback on our community forums, or if you’re an OmniSci Enterprise Edition customer, you can reach out to OmniSci support for questions or help.

Finally we’d like to thank everyone of our customers, partners and community whose enthusiasm for what we do has brought us this far, and the great team at OmniSci who live our mission daily: to make analytics instant, powerful and effortless for everyone.

Also, we’re hiring across the board—if you like what we do, and think you can make us better, we would love to talk!

Venkat Krishnamurthy

Venkat heads up Product Management at HEAVY.AI. He joined OmniSci from the CTO office at Cray, the supercomputing pioneer, where he was responsible for leading Cray’s push into Analytics and AI. Earlier, he was Senior Director at YarcData, a pioneering graph analytics startup, where he bootstrapped product and data science/engineering teams. Prior to YarcData, he was a Director of Product Management at Oracle, where he led the launch of the Oracle Financial Services Data Platform, and earlier spent several years at Goldman Sachs, where he led one of the earliest successful projects utilizing machine learning in Operational Risk incident classification. Venkat is a graduate of Carnegie Mellon University and the Indian Institute of Technology Chennai, and also a certified Financial Risk Manager.