Collecting Telematics Data from the OmniSci Grand Prix
NVIDIA’s GPU Technology Conference (GTC) is right around the corner, and we couldn’t be more excited. At this year’s event, OmniSci has a special challenge for attendees: come by our booth #616 to get a demo from one of our experts and race in the first OmniSci Grand Prix TA (Telematics Analytics). We’ll have a full racing rig setup in the booth, with the Codemasters F1 2018 video game running, and a leaderboard to see who can post the fastest lap. (There will be prizes!)
I’m not gonna lie, we’re doing this partially because we want to race against the rest of the attendees at GTC. But we’re not just playing games, there is also a legitimate data analytics challenge we’ll be showing off.
The F1 game streams telematics data from the virtual car, in real time, as you play it. So, we’ll be collecting 60 packets of telematics data per second from the game into OmniSci, including: information about car position on the track; telemetry data (engine RPM, steering angle, engine/brake temperature, throttle, G-force); car setup data (downforce, tire pressure, wing angles); and current car status (damage, fuel in tank, tire wear).
The spec for the data is pretty deep, and makes for a rapidly-accumulating, complex dataset that simulates the real-world conditions of an F1 race. Trying to turn this massive river of data into insights requires a modern data analytics solution that combines the fastest hardware on the market—NVIDIA GPU-accelerated computing, with the fastest data analytics software on the market—OmniSci.
As the foundation of the OmniSci platform, OmniSci Core is a SQL-based query engine, which means we can handle traditional tabular datasets, similar to your favorite SQL database. But it’s our speed that makes us unique—we’re not just fast, we’re crazy, F1 fast.
You can really see this speed at the boundaries. Large datasets (with tens of billions of rows) can slow traditional databases to a crawl, but we can do SQL queries and cross-filter charting on multi-billion row datasets in hundreds of milliseconds. Complex geospatial data (points, lines and polygons) typically requires specialized software packages, but we can do standard distance and contains queries, and visualize the data over maps, without losing any fidelity.
This category of big, complicated data is so special to us that we use an internal acronym to describe its characteristics: VAST — Volume, Agility, and Spatio-Temporal. If a dataset is big, requires interactive or real-time analysis, and has both location and time dimensions, OmniSci is the platform purpose-built to analyze it.
One particularly interesting subcategory of VAST data is vehicle telematics (like the racing game data), which is used by a number of industries, including autonomous vehicle developers, car insurers, and fleet managers. Vehicles, including planes, trains, and automobiles, outfitted with telematics collection hardware can produce terabytes of data per day. This includes not just location and speed, but specific information about the performance of the vehicle. (You can see what your own car is up to using aftermarket devices like AutoPi.)
For the past few months, we’ve been looking everywhere for a public vehicle telematics dataset that fit our VAST qualifications. It’s been surprisingly hard to find. There are some, including an interesting, but relatively small dataset with 1.7M taxi rides in Portugal, but nothing as interesting as we had hoped. So, we started wondering if we could create our own dataset. And quickly decided that if we’re going to create our own car data, we might as well use fast cars.
Unfortunately, my budget request for a few hundred million to buy a real F1 car and race it around the streets of San Jose during GTC was rejected, so we settled on the next best thing: collecting telematics data streaming off the F1 video game.
The game streams out the data primarily for hardcore gamers who want to hang extra displays on their gaming rigs—which does look slick when you get it setup.
We set the game up in our office first, just to see how much data it would produce, and had a blast playing against our colleagues. About a third of the company ran at least one lap, with some folks running more than 30 laps. We know it was a big sacrifice, but when work calls, we answer. Big shout out to Matt De La Housaye, one of our DevOps engineers, for having the fastest overall lap in our internal time trials. Come see us at GTC and see if you can beat him!
After GTC, we’re going to be taking the OmniSci Grand Prix on the road, inviting an even broader audience to take a lap in our racing rig, and get a look at our setup for collecting the streaming telematics for analysis. If you can’t make it to GTC, join us at one of these upcoming events to learn more:
In the next entry in this series, we’ll dig much deeper into how we collect the packets off the game and insert the data into tables in OmniSci—in (almost) real time. Then in the final blog post, we’ll show you how we built a custom application to visualize and interact with the data.