End-to-end Computation on the GPU with the GPU Data Frame (GDF)
One of the things we are most excited about as a newly open source company is the potential to help kickstart a larger ecosystem of GPU computing. This is why we are particularly excited about our work with Continuum Analytics and H2O.ai to found the GPU Open Analytics Initiative (GOAI) and its first project, the GPU Data Frame (GDF), as our first step toward an open ecosystem of end-to-end GPU computing.
A revolution is occurring across the GPU software stack, driven by the disruptive performance gains GPUs have seen generation after generation. The modern field of deep learning would have not been possible without GPUs (with special credit due to Nvidia for innovating both on the hardware and software side), and as a database we’re often seeing two-or-more orders of magnitude performance gains compared to CPU systems.
But for all of the innovation occurring in the GPU software ecosystem, the systems and platforms themselves still remain isolated from each other. Even though the individual components are seeing significant acceleration from running on the GPU, they must intercommunicate over the relatively thin straw of the PCIe (or via faster NVLink on certain IBM power systems) and then through CPU memory.
For example, until the advent of the GDF project, to take the results of a SQL query in MapD and feed them into a regression algorithm in H20.ai, the following would need to occur:
1) An external process (client) requests that a query is executed in the MapD Core database.
2) The query is executed in MapD Core on the GPU(s). If the data had previously been queried it should already be in GPU RAM.
3) MapD Core then copies the results across the PCIe bus to CPU memory.
4) MapD Core then serializes the results and sends it to the client over a network socket via Thrift, JDBC or ODBC.
5) The client then takes the query results and puts them in a format usable by H2O.ai.
6) The H2O.ai framework then copies the input to the GPU.
7) The H20.ai framework the trains a model on the GPU.
8) The trained model parameters are then copied back to CPU and returned to the client/
As one can imagine, this state of affairs, requiring not only repeated hops from GPU-to-CPU and back, but also the transmission of data across the network (even on a single server), is extremely inefficient. While the aggregate memory bandwidth across a server full of Nvidia Pascal GPUs can approach 6 terabytes per second, the real-world aggregate bandwidth between GPUs and CPUs on a two-socket server is unlikely to exceed 40 gigabytes per second, or 150X slower than the intra-GPU bandwidth available.
This is the situation that the first initiative of GOAI, the GPU Data Frame (GDF), is designed to address. The principal goal of the GDF is to enable efficient intra-GPU communication between different processes running on the GPUs. For example, the GDF allows a process running on the GPU to seamlessly transfer data to another process without copying the data to the CPU. Even more, since the GDF leverages IPC functionality in the Nvidia CUDA programming API, the processes can just pass a handle to the data instead of copying the data itself, meaning that transfers are virtually without overhead. The net result is that the GPU becomes a first class compute citizen and processes can inter-communicate just as easily as processes running on the CPU.
Although the GDF is still a project in beta, the members of the initiative have an aggressive pipeline to make it production-ready by the time of Strata NY in September this year. On our end we are working to enable the GDF to handle the transfer of string and categorical data, to work across multiple GPUs, and to allow enabling bidirectional data flow from the GDF in and out of MapD (currently we only support output to the data format).
I’d also be remiss to not mention that the specification for the data format is based on Apache Arrow, which until this point has been focused on CPU systems. We look forward to collaboration with the Arrow committers and potentially to the idea of some of the functionality of the GDF becoming part of the core Arrow spec.
It is easy to imagine some of the exciting applications that the GDF will enable. Seamless workflows that combine data processing, machine learning (ML), and visualization will be possible, without ever needing to leave the GPU. Users will be able to build a function with Continuum’s Numba to cluster or perform Principal Components Analysis (PCA) on a query result from MapD, or alternatively with a bit of glue code could push the results of a query directly into a deep learning framework like TensorFlow, Theano or Torch.
We look forward to working with the developers to continue to build out both the GDF as well as embark on other projects under the umbrella of GOAI, with the goal of enabling a larger and more cohesive GPU ecosystem. Please join us on the newly created Google Groups for the initiative, we’d love to hear your feedback and thoughts!