Data Propagation is the distribution of data from one or more source data warehouses to one or more local access databases, according to propagation rules. Data warehouses need to manage big bulks of data every day. A data warehouse may start with a few data, and starts to grow day by day by constant sharing and receiving from various data sources.
As data sharing continues, data warehouse management becomes a big issue. Database administrators need to manage the corporate data more efficiently and in different subsets, groupings and time frames. As a company grows further, it may implement more and more data sources especially if the company expansions goes outside its current geographical location.
Data warehouses, data marts and operational data stores are becoming indispensable tools in today’s businesses. These data resources need to be constantly updated and the process of updating involves moving large volumes of data from one system to another and forth and back to a business intelligence system. It is common for data movement of high volumes to be performed in batches within a brief period without sacrificing performance of availability of operation applications or data from the warehouse.
The higher the volume of data to be moved, the more challenging and complex the process becomes. As such, it becomes the responsibility of the data warehouse administrator to find means of moving bulk data more quickly and identifying and moving only the data which has changed since the last data warehouse update.
From these challenges, several new data propagation methods have been developed in business enterprises resulting in data warehouses and operational data stores evolving into mission-critical, real-time decision support systems. Below are some of the most common technological methods developed to address the problems related to data sharing through data propagation.
Bulk Extract – In this method of data propagation, copy management tools or unload utilities are being used in order to extract all or a subset of the operational relational database. Typically, the extracted data is the transported to the target database using file transfer protocol (FTP) any other similar methods. The data which has been extracted may be transformed to the format used by the target on the host or target server.
The database management system load products are then used in order to refresh the database target. This process is most efficient for use with small source files or files that have a high percentage of changes because this approach does not distinguish changed versus unchanged records. Apparently, it is least efficient for large files where only a few records have changed.
File Compare – This method is a variation of the bulk move approach. This process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created. The processing of incremental changes is similar to the techniques used in bulk extract except that the incremental changes are applied as updates to the target server within the scheduled process. This approach is recommended for smaller files where there only few record changes.
Change Data Propagation – This method captures and records the changes to the file as part of the application change process. There are many techniques that can be used to implement Change Data Propagation such as triggers, log exits, log post processing or DBMS extensions. A file of incremental changes is created to contain the captured changes. After completion of the source transaction, the change records may already be transformed and moved to the target database. This type of data propagation is sometimes called near real time or continuous propagation and used in keeping the target database synchronized within a very brief period of a source system.