Data Integration

Data Integration Definition

Data integration refers to the technical and business processes used to combine data from multiple sources to provide a unified, single view of the data.

Diagram depicts the ETL process of data integration from disparate sources.

FAQs

What is Data Integration?

Data integration is the practice of consolidating data from disparate sources into a single dataset with the ultimate goal of providing users with consistent access and delivery of data across the spectrum of subjects and structure types, and to meet the information needs of all applications and business processes. The data integration process is one of the main components in the overall data management process, employed with increasing frequency as big data integration and the need to share existing data continues to grow.

Data integration architects develop data integration software programs and data integration platforms that facilitate an automated data integration process for connecting and routing data from source systems to target systems. This can be achieved through a variety of data integration techniques, including:

  • Extract, Transform and Load: copies of datasets from disparate sources are gathered together, harmonized, and loaded into a data warehouse or database
  • Extract, Load and Transform: data is loaded as is into a big data system and transformed at a later time for particular analytics uses
  • Change Data Capture: identifies data changes in databases in real-time and applies them to a data warehouse or other repositories
  • Data Replication: data in one database is replicated to other databases to keep the information the information synchronized to operational uses and for backup
  • Data Virtualization: data from different systems are virtually combined to create a unified view rather than loading data into a new repository
  • Streaming Data Integration: a real time data integration method in which different streams of data are continuously integrated and fed into analytics systems and data stores

Application Integration vs Data Integration

Data integration technologies were introduced as a response to the adoption of relational databases and the growing need to efficiently move information between them, typically involving data at rest. In contrast, application integration manages the integration of live, operational data in real time between two or more applications.

The ultimate goal of application integration is to enable independently designed applications to operate together, which requires data consistency among separate copies of data, management of the integrated flow of multiple tasks executed by disparate applications, and, similar to data integration requirements, a single user interface or service from which to access data and functionality from independently designed applications.

A common tool for achieving application integration is cloud data integration, which refers to a system of tools and technologies that connects various applications for the real time exchange of data and processes and provides access by multiple devices over a network or via the internet

Data Integration Tools and Techniques

Data integration techniques are available across a broad range of organizational levels, from fully automated to manual methods. Typical tools and techniques for data integration include:

  • Manual Integration or Common User Interface: There is no unified view of the data. Users operate with all relevant information accessing all the source systems. 
  • Application Based Integration: requires each application to implement all the integration efforts; manageable with a small number of applications
  • Middleware Data Integration: transfers integration logic from an application to a new middleware layer
  • Uniform Data Access: leaves data in the source systems and defines a set of views to provide a unified view to users across the enterprise
  • Common Data Storage or Physical Data Integration: creates a new system in which a copy of the data from the source system is stored and managed independently of the original system

Developers may use Structured Query Language (SQL) to code a data integration system by hand. There are also data integration toolkits available from various IT vendors that streamline, automate, and document the development process.

Why is Data Integration Important?

Enterprises that wish to remain competitive and relevant are embracing big data and all its benefits and challenges. Data integration supports queries in these enormous datasets, benefiting everything from business intelligence and customer data analytics to data enrichment and real time information delivery.

One of the foremost use cases for data integration services and solutions is the management of business and customer data. Enterprise data integration feeds integrated data into data warehouses or virtual data integration architecture to support enterprise reporting, business intelligence (BI data integration), and advanced analytics.

Customer data integration provides business managers and data analysts with a complete picture of key performance indicators (KPIs), financial risks, customers, manufacturing and supply chain operations, regulatory compliance efforts, and other aspects of business processes.

Data integration also plays an important role in the healthcare industry. Integrated data from different patient records and clinics helps doctors in diagnosing medical conditions and diseases by organizing data from different systems into a unified view of useful information from which useful insights can be made. Effective data acquisition and integration also improves claims processing accuracy for medical insurers and ensures a consistent and accurate record of patient names and contact information. This exchange of information between different systems is often referred to as interoperability.

What is Big Data Integration?

Big data integration refers to the advanced data integration processes developed to manage the enormous volume, variety, and velocity of big data, and combines this data from sources such as web data, social media, machine-generated data, and data from the Internet of Things (IoT), into a single framework.

Big data analytics platforms require scalability and high performance, emphasizing the need for a common data integration platform that supports profiling and data quality, and drives insights by providing the user with the most complete and up-to-date view of their enterprise.

Big data integration services employ real-time integration techniques, which complement traditional ETL technologies and add dynamic context to continuously streaming data. Best practices for real-time data integration address its dirty, moving, and temporal nature: more stimulation and testing is required upfront, real-time systems and applications should be adopted, users should implement parallel and coordinated ingestion engines, establish resiliency in each phase of the pipeline in anticipation of component failure, and standardize data sources with APIs for better insights.

Does OmniSci Offer a Data Integration Solution?

OmniSci provides unparalleled data analytics integration services that easily manage seamless, big data integration. OmniSci can ingest millions of records per second into the OmniSciDB open source SQL engine and make it available for interactive exploration by business analysts. OmniSci provides an easy to use utility for Kafka data integration, allowing you to connect to a Kafka topic for real-time consumption of messages and rapid loading into a OmniSci target table.