Aaron Williams
Aaron Williams

Getting Your Data Into MapD Cloud

Today’s launch of MapD Cloud includes two methods for importing data: transfering from an AWS S3 bucket, or uploading from a local text-delimited file. In this blog I’ll show you how to do each of those methods, and share a few publicly available datasets that you can import from public S3 repositories and explore.

The first step is obviously to create a MapD Cloud account. Just head over to https://mapd.com/cloud/ and enter your email address to create your instance. Your email address is all we ask for, so it takes less than 60 seconds to get your 14-day free trial up and running.

Once you get into your instance, you’ll see three dashboards that come preloaded with every trial. You can feel free to play around with those data sets as well (and we’ve created videos to walk you through them). But to upload your own data, you’ll want to click on the Data Manager tab at the top of the page.

Inside the Data Manager, you can see the three data sets that we’ve preloaded and their details. If you click on the Import Data button in the upper right corner, you’ll get the two options for importing data.

Cloud - Data Manager

Cloud - Table Importer

TIP: no matter which method you choose for uploading the data (local file or S3), consider zipping your text-delimited file before you upload it. MapD Cloud will unzip zipped files and extract the file, which can save you upload or transfer time. This only works with zip files with a single text-delimited file inside.

Uploading Data from a Local Text-Delimited File

Cloud - Local Text-Delimited File

When you click on the “Import data from a local file” option, you’ll see a page where you can either drag and drop a file into MapD, or you can click on the page and a file dialog will pop up for you to select a local file. Any well-formed text-delimited file will work here.

After your file is uploaded, MapD will ask you to confirm the import settings. When you’ve set these correctly for your data file, click Import Files.

Cloud - Import Settings

Now that your data is uploaded, you can jump to the section below on importing data.

Transferring Data from an S3 Bucket

When you click on the “Import data from Amazon S3” option, you’ll see a form where you can choose to upload from a region, bucket, and path, or from an S3 Link. Either method will work with private or public data.

Cloud - Transferring Data from an S3 Bucket

Once you set the import settings, and click Import Files, MapD will transfer the file and begin the import.

Importing Data into MapD

After your file is uploaded, no matter if you chose to upload from a local file or transfer from S3, MapD will import your data into the database, and notify you if any column headers needed to be changed (because they had spaces in them, for instance).

Importing Data into MapD

TIP: take a few moments to scroll through the Table Preview to check that the column data types are what you’d like them to be. For example, Latitude and Longitude data should be imported as either float or double values. In cases where the majority of table data has been auto-detected as another type, in this case ‘string’, you may need to go in and manually adjust some types.

Once you see the Table Preview, you can enter a name for your table at the bottom and click Save Table.

Table Preview

That’s it! The final screen just shows you a summary of the table you’ve successfully imported. You can click on the Back link in the upper left corner to go back to the Data Manager and see all of your loaded data sets.

successfully imported

Public Data Sets

Now that you know how to upload data, we have a few public data sets that you might want to try.

S3 Transfers

Use the S3 method and the public S3 links below.

File Downloads / Uploads

Download the files from these sites, and use the file upload method to import these data sets.

Happy importing!

If you run into any issues or have another public dataset to share, please come visit the Community Forum.

Aaron Williams

About the Author

Aaron is responsible for OmniSci’s developer, user and open source communities. He comes to OmniSci with more than two decades of previous success building ecosystems around some of software’s most familiar platforms. Most recently he ran the global community for Mesosphere, including leading the launch and growth of DC/OS as an open source project. Prior to that he led the Java Community Process at Sun Microsystems, and ecosystem programs at SAP. Aaron has also served as the founding CEO of two startups in the entertainment space. Aaron has an MS in Computer Science and BS in Computer Engineering from Case Western Reserve University.