Today’s launch of MapD Cloud includes two methods for importing data: transfering from an AWS S3 bucket, or uploading from a local text-delimited file. In this blog I’ll show you how to do each of those methods, and share a few publicly available datasets that you can import from public S3 repositories and explore.
The first step is obviously to create a MapD Cloud account. Just head over to https://mapd.com/cloud/ and enter your email address to create your instance. Your email address is all we ask for, so it takes less than 60 seconds to get your 14-day free trial up and running.
Once you get into your instance, you’ll see three dashboards that come preloaded with every trial. You can feel free to play around with those data sets as well (and we’ve created videos to walk you through them). But to upload your own data, you’ll want to click on the Data Manager tab at the top of the page.
Inside the Data Manager, you can see the three data sets that we’ve preloaded and their details. If you click on the Import Data button in the upper right corner, you’ll get the two options for importing data.
TIP: no matter which method you choose for uploading the data (local file or S3), consider zipping your text-delimited file before you upload it. MapD Cloud will unzip zipped files and extract the file, which can save you upload or transfer time. This only works with zip files with a single text-delimited file inside.
Uploading Data from a Local Text-Delimited File
When you click on the “Import data from a local file” option, you’ll see a page where you can either drag and drop a file into MapD, or you can click on the page and a file dialog will pop up for you to select a local file. Any well-formed text-delimited file will work here.
After your file is uploaded, MapD will ask you to confirm the import settings. When you’ve set these correctly for your data file, click Import Files.
Now that your data is uploaded, you can jump to the section below on importing data.
Transferring Data from an S3 Bucket
When you click on the “Import data from Amazon S3” option, you’ll see a form where you can choose to upload from a region, bucket, and path, or from an S3 Link. Either method will work with private or public data.
Once you set the import settings, and click Import Files, MapD will transfer the file and begin the import.
Importing Data into MapD
After your file is uploaded, no matter if you chose to upload from a local file or transfer from S3, MapD will import your data into the database, and notify you if any column headers needed to be changed (because they had spaces in them, for instance).
TIP: take a few moments to scroll through the Table Preview to check that the column data types are what you’d like them to be. For example, Latitude and Longitude data should be imported as either float or double values. In cases where the majority of table data has been auto-detected as another type, in this case ‘string’, you may need to go in and manually adjust some types.
Once you see the Table Preview, you can enter a name for your table at the bottom and click Save Table.
That’s it! The final screen just shows you a summary of the table you’ve successfully imported. You can click on the Back link in the upper left corner to go back to the Data Manager and see all of your loaded data sets.
Public Data Sets
Now that you know how to upload data, we have a few public data sets that you might want to try.
Use the S3 method and the public S3 links below.
- Chicago Crime Data Includes 6.5M rows of crime data from the city of Chicago going back to 2001.
- Ford Go Bike Data
Includes 519k bike rentals in San Francisco from October 2017 to January 2018.
- US Chronic Disease Indicators
Includes data on 523k patients from the CDC's Division of Population Health.
- NCAA Basketball Games
Includes details on 76k NCAA basketball games going back to 2003.
- Housing Affordability
Includes 64k housing units categorized by affordability and households categorized by income.
File Downloads / Uploads
Download the files from these sites, and use the file upload method to import these data sets.
- NYC Citi Bike Data
- Instacart Order Data
- Election Data by County for 2016
- Lending Club Lending Data
- Kaggle (a data set repository with all kinds of great data)
If you run into any issues or have another public dataset to share, please come visit the Community Forum.