Working with big data can be frustrating, as there is often a tradeoff between a high-level API and poor I/O performance
Blog posts about Demo Overview
The taxi dataset is one of the most popular on our site and for good reason, it is not often that you can get behind the wheel of a supercomputer for free.
What is obscured in the vitriol, the accusations and the gaffes, however, is that money still fuels the American political process. Despite the emergence of a billionaire candidate, this cycle is no different - the money is as prevalent as ever.
With the latest addition to our public demos, we have the absolutely spectacular 1.2 billion row taxi/limo/uber/lyft dataset from NYC. The dataset is comprised of staggering detail (full GPS, transaction type, passenger counts, timestamps) from January 2009 through June 2015 (essentially the birth of rideshare).
In the dataworld, there is a particular dataset, referred to as “the taxi dataset,” that has been getting a disproportionate amount of attention lately.
A few years back, the American Statistical Association put out a dataset of hundreds of millions of US airline flights from 1987 to 2008, as part of a supercomputing competition. The dataset includes every single flight record known by Bureau of Transportation Statistics for that two decade period; every prop plane, every jet plane, balloon or blimp.
While we love datasets of all shapes and sizes at MapD, Twitter holds a special place in our hearts. This is perhaps because we find Twitter data to be almost peerless among public datasets in its ability to provide a glimpse into the human experience - revealing what people are saying when and where.