OmniSci for Data Scientists

The demand for your data science skills continues to grow as enterprises and government agencies seek new, efficient and faster insights from the exponentially growing amount of data they collect. As the human thinker who makes machine learning (ML) possible, OmniSci gives you the power to explore big data at the speed of thought, resulting in faster and easier feature engineering, ability to explain models visually, and easier ongoing monitoring and maintenance of ML models.

Solutions: accelerate the human work behind AI and ML

Data Science Solutions

Unmask the black box for others by visualizing the data you used to train your models

Learn More >

The OmniSci Extreme Analytics™ Platform

OmniSci Extreme Analytics Platform

Business and government find insights beyond the limits of mainstream analytics tools

Watch Now >

Accelerate feature engineering on OmniSci Cloud for free

Try OmniSci Cloud

See how quickly you can explore data and engineer features to train your models.

Try OmniSci Cloud - free for 14-days >

Key Data Scientist Challenge 1

Making Feature Engineering Faster

Challenge

Feature engineering, a time-consuming but necessary step for ML models

Solution

Accelerate feature engineering with the fastest queries and instant cross-filtering

How

Follow the same steps as VW: a tutorial on OmniSci for feature engineering

As a data scientist, you spend a lot of time with feature engineering: using domain knowledge to extract new variables from raw data to train your algorithms. A recent study by Forbes found that as a group data scientists surveyed spend about 80% of their time preparing data, even though it’s one of the least enjoyable parts of their work. Sound familiar?

Feature engineering takes time, because you need to understand the big data you might use to train your models. OmniSci provides an interactive, visual solution to that data discovery challenge. Data scientists can cross-filter on a combination of attributes, which allows them to quickly explore how different features interact, and develop a much faster understanding of the data.

Read this blog post by OmniSci data scientist Wamsi Viswanath and follow the same steps that Volkswagen took to build models that predict customer churn. Use these notebooks as guides and follow Wamsi’s instructions to: extract data from OmniSci, preprocess it in Pygdf, train a model, do the predictions with XGBoost, and store the results back in OmniSci.

Key Data Scientist Challenge 2

Explaining Black-Box Models to Others

 

Challenge

Once you’ve built a powerful black box model, how do you explain it to others?

Solution

Visual transparency helps stakeholders approve ML adoption and accelerate delivery

How

Visualization and shared dashboards make explaining models easy and interactive

 

Once you’ve trained your ML model, the leaders who must approve its use want to understand its logic. Will autonomous vehicles drive safely? Will loan applications be declined with unbiased logic? Do disease diagnoses align with hospital procedures? Approvers must trust the algorithm’s decision, even when they’ve never built an algorithm. You may have a hard time singling out a reason for any specific action, and this often slows or blocks approvals.

OmniSci lets stakeholders visualize the data that trains your models, giving them the trust they need for approval. After that, skeptical colleagues may insist on a “wait and see” approach, only partially adopting the model until it has proven itself. If those gatekeepers feel greater trust from the beginning, they are likely to support a more aggressive rollout.

Data scientists like you need to interact with the raw data in a familiar interface and then explain your process and share the results with colleagues. OmniSci makes that fast, easy and intuitive by: executing the underlying SQL query that drives a visualization; rendering and rasterizing the query results directly on the graphics processing unit (GPU); and letting you visually share the results of your work with others. This makes it easy for stakeholders to trust your ML models.

Key Data Scientist Challenge 3

Monitoring Models in Production

Challenge

Monitor existing models more efficiently and free more time for innovation

Solution

OmniSci provides an always-on dashboard for monitoring the health of ML models

How

Multi-source dashboards save time merging tables and preparing data

A typical data scientist puts models into production and then works to replace those models with superior ones. But each existing model has a monitoring carrying cost. Time spent checking on an existing model is time taken away from building better models.

As OmniSci makes monitoring faster and easier, that frees more of your time to improve existing models or to create entirely new ones. When you can visualize predictions alongside actual outcomes you can see when and how predications diverge from real life. As OmniSci increases the monitoring efficiency of each data scientist, the team’s productivity improves

The OmniSci Immerse visualization system can display multiple distinct datasets in the same dashboard. With multi-source dashboards, each chart (or groups of charts) in a dashboard can point to a different table, without having to merge the underlying tables. This saves data preparation time and uncovers surprising multi-factor relationships that can help you innovate your ML models faster.

VW

Volkswagen Uses OmniSci to Visualize and Interrogate Black Box AI Models

Forbes

Blending Man And Machine To Get The Most From AI

Jupyter Conference - August 24, 2018

Become a OmniSci Insider

Subscribe to receive the latest news, product announcements, and blog posts.