An interactive GPU-powered deep dive into 11.6 billion rows of US shipping data
For some of us, summer is synonymous with salt water and waves. For many others, the sea is a year-round occupation. The US has 12,383 miles of coastline and 95,471 miles of shoreline, and it buzzes with billions of trips each year, all tracked by the US Coast Guard.
Our latest demo of MapD Core and MapD Immerse reveals the vast scope of marine activity around America’s shores–everything from the tracks of commercial freighters to the patrols of military vessels to the lazy patterns of pleasure boats out for a Sunday sail on San Francisco Bay.
With more than 11 billion rows of public ship AIS data to explore, spanning from 2009 to 2014, you can filter the data by ship type such as tugboat, cargo ship, passenger ship or tanker, by length (the largest are about 350 ft) and of course by time, showing seasonality and trends. The visualization traces the path of each vessel, allowing you to investigate the main shipping lanes around the US coasts, key ports and waterways.
While we pride ourselves at MapD our ability to scale up to multiple GPUs per server before we have to scale out, a dataset of this size requires a distributed multi-server setup (all of our other demos run on some portion of a single server). For this particular demo we are running on four servers with 8 Nvidia 1080Ti GPUs each. If you’re an avid gamer, you’ll know that the 1080Ti has been a smash hit due to its price/performance characteristics, packing over 11 teraflops in a sub-$700 price tag. With an enterprise-grade card like the Nvidia Tesla P40, which has 24GB of RAM each, one could build the same demo with two servers with 8 P40 cards each.
We call our 4-node cluster the Beatles, with the respective servers appropriately named John, Paul, George, and Ringo! Here is one of the servers in all its parallel glory being installed in our racks in Santa Clara. That's 28,762 GPU cores you are looking at, or 115,048 cores across the cluster!
With an aggregate 362 teraflops of compute at our disposal, even a dataset of this size can be explored interactively and without lag in MapD. One of the advantages of having this much power on tap is that it eliminated the need to index, cube or otherwise pre-aggregate the data. When you filter on an attribute in the dashboard, each chart issues a SQL query against the backend, scanning the 11.6B record dataset in milliseconds for unprecedented interactivity. Below you can see the SQL query log issued in the browser console.
With this kind of speed interesting factoids quickly present themselves. For instance, the lowly tugboat is the little-noticed hero of America’s waterways, accounting for 5 billion voyages, with cargo and passenger vessels as the following specified categories.
Immediately obvious will be the main shipping lanes around busy ports such as Los Angeles:
These can be filtered by vessel type, so let’s take a look at the naval base in San Diego and see what they’ve been up to:
You can see the huge amount of military traffic around San Diego, and the areas of peak activity in September 2010 and 2011. Diving into a six-month period, we can see more easily where the military patrols were:
Now let’s move to the Gulf of Mexico and take a look at the data around the Deepwater Horizon oil spill which took place in 2010. The clean-up operation can clearly be seen in this chart, with filters for that period and anti-pollution vessels selected. In fact, during the year 2010 alone, there were almost 2.5m voyages by anti-pollution vessels in the Gulf.
Moving east we can take a look at New York. You might be surprised to see how many tankers carrying hazardous loads come in and out of the city port, though the decline from mid-2013 suggests they’ve been routed elsewhere.
The data doesn’t just show our coastal traffic, but also the waterways and lakes. For instance, many people enjoy sailing and using pleasure craft on the Great Lakes, as can be seen here:
Notice the clear seasonality (understandably) of this activity, with November through April being the low season and a steep rise from May to a peak every year in August. So don’t expect to get the Lakes to yourself in August.
Speaking of seasonality, another key industry is fishing in the Gulf of Alaska, so let’s take a trip over there to see where the most popular spots are:
Once again August looks to be a popular time, but notice also the growth in the number of fishing vessels over the six years of this dataset. Unlike in many other areas, fishing trips are by far the most numerous in the Gulf of Alaska.
If you’re a commercial fisherman this data might be useful to determine where your competition is casting their nets. Zooming into the coast of Vancouver and again filtering on fishing boats we can see where the action is, likely following the schools of fish. Note here we are using accumulation rendering (available as a setting in the Pointmap chart) to easily see the locations of the densest traffic.
However a great fishing spot this year might not be a good one the next. By brushing the time chart we can see how the popular fishing spots change over time, likely driven by the movements of the schools of fish they chase!
Take a few minutes to explore the data yourself. Go see the how many tankers come out of the port of Houston, look at law enforcement craft around Miami, see how far the sailing yachts go around Honolulu, when the Mississippi river gets dredged or the popular dive spots in the Gulf of Mexico.
We hope you enjoy this demo, and be sure to check out the other ones here. Or if you want to try out on MapD on your own data, download our community edition or check out the open source code on Github. Bon voyage!