Announcing OmniSci 4.8: Bridging the Analytics and Data Science Chasm
Back in March (seems ages ago!) at GTC 2019, we had the privilege of being featured for 11 minutes in NVIDIA CEO Jensen Huang’s keynote address. Jensen’s banter with our very own Aaron Williams was great fun (helped mildly by the fact that the demo gods smiled favorably on us that day). At one point, Jensen made a key off-the-cuff observation about the speed at which OmniSci Immerse allowed unfettered interactive exploration of a spatiotemporal telecom dataset with ‘just’ a half billion rows, something at the lower end of OmniSci’s capability. High praise indeed, but for us, this perfectly crystallized the experience that we strive to provide our customers everyday—an unprecedented combination of interactivity and scale, power and simplicity.
Todd Mostak’s impatience with the status quo of analytics tools first led him to build a platform that kept up with the speed at which he wanted to ask questions of the Twitter firehose for his research into the Arab Spring in 2013. Fast forward to today, and OmniSci is reinventing analytics by allowing users to finally break free from what are considered given notions in the space—downsampling and pre-aggregation, canned answers to a fixed list of questions, and an interminable wait for even those answers. Not a day goes by without our customers finding actionable insights from their largest datasets, insights that would have been out of reach with mainstream tools.
At the same time, we have always known that the superfast SQL and charting capabilities that OmniSci offers today for analytics were part of a larger vision. Machine Learning and AI are here, are real, and promise to be transformative even if it is early days. Data scientists straddle these worlds, combining human perception and intuition (which is still the benchmark for any AI), with the ability to teach machines to make sense of data at any scale. At OmniSci, we have known for some time that Machine Learning and data science workflows represent our next frontier of capabilities. Over the past year, we set out to build the foundation for pushing much further into this area, with help from our partners at Quansight.
With OmniSci 4.8, we are now ready to share these first steps with the world. In addition to the new OmniSci data science foundation, release 4.8 contains several other key features that represent long-standing customer and community requests. Together, they visibly transform the OmniSci user experience.
To begin, let’s look at OmniSci Immerse, where we’ve added a number of eagerly awaited new features.
Flying to Jupyter
4.8 adds support for JupyterLab, the next generation notebook interface that has redefined interactive computing particularly for Data Science. You can now launch JupyterLab from Immerse - which opens a new notebook (surprise!). Further, you’re already connected to the underlying OmniSciDB, with the same OmniSci credentials you used to log in to Immerse. We built this all on JupyterHub, the multi-user incarnation of Jupyter, and connected it closely to our own authentication and authorization flow to make it truly seamless for a user.
This is useful by itself, but represents the tip of the proverbial iceberg. Our Data Science foundation is a complete, open stack within the PyData ecosystem built on our pymapd API, to provide a Pandas-like experience on top of OmniSciDB (building on the Ibis open source project), access to in-notebook interactive charting via Altair (with support for even more interactivity with ipywidgets) and a whole set of notebook-specific utilities via our jupyterlab_omnisci extension. With our partners at Quansight, we went as far as making interactive Altair charts integrate with Ibis underneath so you can now build interactive data visualization within the PyData workflow, against extremely large datasets.
We’ll be working with our Developer Advocates to dive progressively deeper into all the cool things you can now do with this powerful combination of tools at your disposal. Watch out next week for a much more detailed dive into each of the above tools and how they help you converge Data Science and Analytics workflows to extract signal from any data in OmniSci.
This is all just the beginning - beyond just the tools, our goal over the next year is to enable seamless integration of AI and traditional analytics workflows, but always focused on the consumer of insights while leveraging all the available performance and capability of the underlying infrastructure.
Zooming Around in Space and Time
The defining feature of OmniSci Immerse has always been our interactive geospatial charting. The ability to fluidly zoom in from up to hundreds of billions of points at ‘world level’ down to a city block with crossfilter, is a powerful way to understand patterns in spatial data. Avery common usage paradigm we observed is the use of zoom on a map alongside the brush filter capability on the Combination (‘combo’) chart to see trends through both space and time, with the range chart providing the ability to narrow a particular interval of time for exploration.
With 4.8, we’ve taken the next step to make this even more powerful. We now support zooming and panning on combo charts. You can get a sense of how powerful this is when you use this capability to explore datasets with a billion rows spanning multiple time horizons - i.e where the lowest time grain is second or subsecond, but the aggregated range is over multiple years. When the data is both spatial and temporal, well, a (moving) picture is worth a thousand words.
Watch how the same simple zoom gesture allows a user to go from 6 years of data for all the US, down to a small region around the US Gulf Coast, narrowing from 11 billion rows to 3 billion in less than a second. Next, watch how the same zoom gesture now available on the combo chart allows you to go from 6 years worth of data to about 60 seconds, basically at the speed at which you can move a mouse or trackpad. We also support click/drag based panning on the combo chart. Not only that, with the new support we’ve added to Immerse for millisecond-level timestamps, and OmniSciDB for up to nanosecond precision, we’re really excited about how this feature, combined with our map-based zoom opens up a completely new frontier for interactive spatial/temporal data exploration at any scale.
Finally, a couple of minor but very useful updates to the zoom feature on maps is the presence of the +/- buttons to allow for more controlled navigation. This allows you to both set and unset zoom at specific levels. We’ve also added the ability to reset the zoom level on the combo chart as well.
Dark is the New...uh...Black
Time for a minor origin story. In the lead up to our presence at the GTC keynote, Aaron Williams, our fearless and beloved VP of Global Community, was doing a practice run through the day before the talk. He pointed out to us that there was an actual risk of seared corneas when OmniSci was projected on a gigantic screen in the default (non-dark) mode! We unanimously agreed this would not be a good look for a visual analytics platform, and decided to take on Dark Mode as an urgent priority. We hacked a basic stylesheet together for GTC but our UX team followed up with a much more thorough design—and we’re thrilled to announce its availability in 4.8. The results, as you can see, are enough to tempt one over to the dark side.
Besides the headline features above, we also added a number of smaller but nevertheless useful ones requested by our customers and community. A long standing request relates to the ability to export (and correspondingly, import) dashboards from one OmniSci installation to another. With 4.8, we’ve added the ability to quickly export dashboard metadata as JSON, and also import them back into Immerse easily.
Additionally, Immerse now supports duplicating specific charts inside a dashboard, which is useful where you may want to reuse the chart config with minor tweaks.
We also took the opportunity to make several performance and stability improvements in Immerse itself—fewer errors, and better performance continue to be key focuses for the team.
As always, OmniSciDB sets the foundation for the amazing things that Immerse is capable of doing. In 4.8, we continued to further strengthen the foundational differentiators in OmniSciDB, as well as invest in making the platform more solid and scalable.
On the performance and scalability front, we added early support (behind a flag) for better memory management related to certain classes of GROUP BY queries, especially those with a large number of groups. We added further deeper support for columnar outputs for result sets, something that we expect to factor in to our ongoing effort around in-situ Arrow-based result sets on the GPU, which will enable several key use cases targeting Machine Learning workflows via the RAPIDs toolkit. Both these enhancements are deep and foundational, and will set the stage for even better performance at scale across the entire platform in the near future.
On the spatial temporal side, we added support for the ST_POINT constructor, and continued performance improvements of the ST_CONTAINS operator. We’re hard at work on foundational performance improvements on spatial joins in general - again, targeting ‘OmniScale’ use cases where users routinely attempt these joins on billion row datasets.
Finally, we rolled out several important administrative features including encrypted connections between OmniSci components in distributed mode, and also improved our logging infrastructure.
None of these features would be possible without the efforts of our stellar engineering teams, and the speed at which they are driving innovation across the platform. A massive thank you and shout out in particular, to our Cloud and Security Operations team and also to our Solutions Architects. They contributed immensely in making this big step a reality and went the last mile in applying their expertise to make the entire Jupyter integration work as smoothly as it does, particularly from a deployment perspective. Last, but definitely not least, our awesome partners at Quansight whose work with us over the last year formed the basis of the data science foundation.
How to Access OmniSci
For new users, OmniSci 4.8 can be accessed through a variety of ways: download a pre-built version of our Open Source edition, sign up for our 30-day Enterprise Trial on our downloads page, or get instant access with a free 14-day trial by signing up for OmniSci cloud. In addition, on MacOS or Linux, you can also install the JupyterLab tools now directly from conda-forge with conda install -c conda-forge omnisci-pytools.
For a longer list of features and fixes in the release, check out our release notes. For details on how to use them, please refer to the latest help docs. As always, we look forward to hearing your feedback.
Join Us at Converge!
We’re also looking forward to Converge, our inaugural user conference, where we’ll have a lineup of luminaries in the data science space, including Travis Oliphant and Wes McKinney. We hope you’ll join us and some of our key customers as we will be unveiling even more exciting capabilities and directions for the platform.