In our “(2018) Year in Review” blog post, VP of Product Management Venkat Krishnamurthy highlights all of the amazing capabilities built into the OmniSci platform over the past year. As part of the company mission to make analytics fast and accessible to all, we’ve also spent a lot of time working towards seamless integrations with tools in the open-source data science stack, focusing on Python to start.
The newest release of "pymapd (0.7)" picks up on the work from 2018; here’s what changed since the last pymapd release and our plans for pymapd and related tools in the first half of 2019.
Updated support for RAPIDS via cudf
From the beginning, OmniSci has supported open-source, GPU-based analytics as part of the GPU Open Analytics Initiative. This initiative defined an interface for a GPU-based data frame that could be accessed via shared GPU memory, and thus, zero-copy data sharing between tools that supported the interface.
The originally developed library for GPU dataframes, pygdf, got rolled into the cudf library as part of the RAPIDS.ai project by NVIDIA. Prior versions of pymapd supported pygdf, and pymapd 0.7 now supports the cudf library. While this change is primarily in name only, if you were a prior pygdf user, you’ll need to install cudf to keep using GPU dataframes.
For those of you who weren’t using GPU dataframes, here’s the conda command to get started:
conda install -c conda-forge -c nvidia -c rapidsai -c numba -c defaults pymapd cudf python=3.6
Note that to use GPU data frames, you need to have an NVIDIA GPU and the ability to run Python code on the server where the GPU is located. For more information about the exact hardware and software requirements, please see the "RAPIDS.ai technical documentation."
Support for Python 3.7 and pyarrow 0.11, deprecated Python 3.4
With pymapd 0.7, we’ve relaxed our package dependencies to allow for Python 3.7 and/or "pyarrow 0.11" with one caveat: if you want to use cudf for GPU dataframes, cudf requires pyarrow 0.10, which precludes the use of Python 3.7.
That said, there is no pymapd-specific functionality that requires Python 3.7 or pyarrow 0.11, just that users can use either if they choose to (or other packages require specific Python or Arrow versions).
We’ve also chosen to deprecate support for Python 3.4, as our peers and some of the pymapd dependencies in the larger PyData stack are moving towards Python 3.5+ only.
While we’ve technically supported installation via pip since the beginning, installation with pip is a challenge for many packages in the scientific Python stack. pymapd 0.7 cleans up the pymapd package install process considerably, removing the need for Cython compilation.
So if you are a pip user and have had trouble in the past with pymapd, those issues should be resolved!
Near-term work: Better integration with Ibis and JupyterLab, deprecating Python 2.7
With these foundational changes in place for pymapd 0.7, in the near-term we’ll continue to improve our compatibility with OmniSci Core around geospatial data, as well as improving our integration with "Ibis" for a pandas-like API when working with OmniSci.
We also have a lot of great work happening with our collaborators at "Quansight" towards "JupyterLab integration", which not only provides a tailored "Jupyter Notebook" experience but also incorporates "Altair" for declarative data visualization.
With an eye on better integration with the aforementioned tools, as well as general maintainability, we will be dropping Python 2.7 from our supported Python versions in the very near future. Python 2.7 is scheduled for end-of-life at the end of 2019, and "many of our package dependencies have already dropped Python 2.7 support."
Plenty of Work to Do...Come Join Us!
With all of the planned updates to our Python libraries and the ever-increasing capabilities of the OmniSci platform, we need help! If you are looking for a new challenge, check out the "OmniSci Careers" page. We’ve got open roles in engineering, visualization, sales, marketing, finance, and even the Community team.