Seamless Big Data Integration

Today’s data managers are challenged with a growing ecosystem of data sources and warehouses, making big data integration more complex than ever. Your data lives in many data warehouses and data lakes; it continually flows in through streams or rests as point-in-time files. Regardless of the source, OmniSci easily handles data ingestion of millions of records per second into the OmniSci Core open source SQL engine.

Streaming Data Integration

Today’s big data ingestion tools must integrate with a wide variety of data sources and networks. Streaming data originates from sensors, network logs, social media, and web clickstreams from all over the globe. This can produce billions of records per week for large organizations. Streaming ingest engines, such as Apache Kafka, organize and distribute this information before finally funneling it into storage.

Although many platforms offer automated streaming data analytics tools, only OmniSci can ingest this volume of data and make it available for interactive exploration by business analysts. OmniSci provides an easy to use utility for Kafka data integration, allowing you to connect to a Kafka topic for real-time consumption of messages and rapid loading into a OmniSci target table.

Data at Rest

Most of the world’s data is at rest, stored in data warehouses, enterprise databases, or Hadoop data lakes. The vast majority of this data has never been explored or analyzed, and it represents an incredible amount of untapped insight. OmniSci easily supports batch import of data at rest, via these methods:

For Delimited Files:

  • Consume files such as CSV or TSV easily into OmniSci Core using OmniSciql.
  • OmniSci Core can import compressed files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ formats.

From Data Lakes or Data Warehouses:

  • Pull data from Apache Hadoop Distributed File System (HDFS) or from structured data warehouses with Apache Sqoop.

Get the OmniSci Whitepaper

Learn more about putting an end to Indexing, Down-sampling, and Pre-aggregating data.