Data Replication is a set of data copied from a data site and placed at another data site during Data Replication. It is also a set of data characteristics from a single data subject or data occurrence group that is copied from the official data source and placed at another data site. Data Replicates are not the same as redundant data.
Data Replication also refers to a formal process of creating exact copies of a set of data from the data site containing the official data source and placing those data in at other data sites. Another aspect of Data Replication is the process of copying a portion of a database from one environment to another and keeping the subsequent copies of the data in sync with the original source. Changes made to the original source are propagated to the copies of the data in other environments.
Data Replication is a common occurrence in large data warehouses to help the system function efficiently and guard against entire system failure. Many data warehouse systems use Data Replication to share information in order to ensure consistency among redundant resources like hardware and software components.
In some cases, it could be could be a data replication instance if the same data is stored on multiple storage devices or computer replication in cases where the same computing task is being executed many times. In general, a computational task is being replicated in space such as being executed on separate devices. It could also be replicated in time such as in a case where it is being executed repeatedly on one device.
Data Replication is transparent to an end user. So a data consumer would really not know where among the data sources the data he or she is using is coming from because she only gets the impression of one monolithic data warehouse. The access to any replicated data is usually uniform with access to a single, non-replicated entry.
There are in general two types of Data Replication. These two are active and passive replication. An active Data Replication refers to the process wherein the same request is being performed at the every data replica. On the other hand, passive Data Replication is done with each single request being processed on one replica and then the state is being moved to many other replicas. It at any given time on of the master replicas is designated to handle the processing of all requests, this is what is being referred to as primary-backup scheme (master-slave scheme). This scheme is predominantly used in high-availability clusters.
On the other hand, if any of the replica processes a request and then distributes a new state, this is what is being referred to as multi-primary scheme (called multi-master in database field). In this scheme, some form of distributed concurrency control such as distributed lock manager need to be employed.
In the area of distributed systems, Data Replication is one of the oldest and most important aspects. Some of the best known replication models in distributed systems include:
Transactional Replication – This model is used in data replication for transactional data used in relational databases or some other kinds of transactional storage structure. Typically, this type of replication employs the one-copy serializability model which defines legal outcomes of a transaction involving replicated data.
State Machine Replication – This replication model assumes deterministic finite state machine and a possibility of an atomic broadcast in every event. In a lot ways, this model has similarities with the transactional replication but this one is based on a distributed computing problem referred to as distributed consensus. Many people sometimes get this model mistaken for active replication.
Virtual Synchrony Replication – This is actually a computation model employed in cases when a group of processes work together in order to replicate in-memory data or to coordinate actions.