MapD Open Sources GPU-Powered Database
Since starting work on MapD more than five years ago while taking a database course at MIT, I had always dreamed of making the project open source. It is thus with great pleasure to announce that today our company is open sourcing the MapD Core database and associated visualization libraries, effective immediately.
The code is available on Github under an Apache 2.0 license. It has everything you need to build a fully functional installation of the MapD Core database, enabling sub-second querying across many billions of records on a multi-GPU server. All of our core tech, including our tiered caching system and our LLVM query compilation engine, is contained in today’s open source release.
We are doing this first and foremost out of our belief in the transformative power of open source software. Whether in the Hadoop or deep learning ecosystems, open source is driving tremendous innovation that simply has not been possible with proprietary software.
For me personally, this day has been long in the coming. My goal has always been to open MapD to the world, but I hesitated to do so at the beginning out of a desire for the codebase to mature. And then building the product and scaling the company imposed a different set of priorities.
However it was clear from my first meetings with Greg Papadopoulos and Forest Baskett at NEA (who recently led our $25M Series B funding round) that we shared a belief in the disruptive potential of open source, particularly in the analytics space. We noted that while GPU-accelerated machine learning was eating the world, there was a gaping hole in the analytics stack running on GPUs. Almost the entire GPU ML and deep learning stack was open source, but there was no open source data processing engine to complement it. This is the hole we set ourselves on filling.
Being open source allows us to integrate with the rest of this ecosystem in a way that we simply could not do as a closed system. Hence in conjunction with the open source announcement we are also excited today to announce the foundation of the GPU Open Analytics Initiative (GOAI) with Continuum Analytics and H2O.ai. Together we are unveiling our first project, the GPU Data Frame (GDF). The data frame allows for efficient interchange of data between GPU processes without the overhead of copying data or moving it to CPU. Our hope is that this project will be a step towards enabling an open end-to-end pipeline on GPUs.
We are open sourcing the following today:
MapD Core open source database - source code of the MapD Core database provided under the Apache 2 license. The code provides everything needed for multi-GPU acceleration of SQL queries.
We are simultaneously unveiling the MapD Analytics Platform Enterprise Edition which contains the MapD Core database, the MapD Core GPU rendering engine, and the MapD Immerse visual analytics client. It also contains distributed scale-out, high availability (HA), LDAP, and ODBC capabilities that are not in the open source version. Our roadmap contains additional features, particularly around security, which will be added to the Enterprise Edition.
We are also offering a downloadable community edition binary that contains the MapD Core database, our GPU rendering engine, and the MapD Immerse visual analytics client under a non-commercial academic license.
We’re tremendously excited about the path ahead and the hard work of building a community. As a first step we have set up a community forum—we’d love to hear your thoughts, comments and questions, technical or otherwise.
Looking forward to building something amazing together.