Data Concurrency ensures that both official data source and replicated data values are consistent, that means whenever data values official data source is updated then the corresponding replicated data values must also be updated via synchronization in order to maintain consistency.
In a single user database, each transaction is processed serially, therefore there is not need for contention with interference from other transactions. But in a large data warehouse environment, there could be hundred or thousands of users and data consumers from across many different locations trying to access the warehouse simultaneously. Therefore, a single user database will not do.
In a multi-user database powering a data warehouse, transactions are executed simultaneously from a wide array of sources. Each of these transactions has the potential to interfere with other running transactions within the database. So, it is a good practice to isolate transactions from each other within the multi user environment. But there must be a way of collating the transaction data so that the data warehouse can come up with aggregated reports.
Allowing more than one application or other data consumers to access the same data simultaneously while being able to maintain data integrity and database consistency is the main essence of data concurrency.
Because transactions are isolated from each other, data will definitely be replicated. For example, I and two other friends are simultaneously buying the same item from the same e-commerce site. We are also simultaneously buying with one thousand others from different parts of the globe. Therefore we are technically doing the same transaction. But unbeknown to us, our transactions are processed in isolated cases in the backend data warehouse. Yet, the database interprets all of us as using the same data simultaneously.
When multiple users attempt to make modifications to a data at the same, some level of control should be established to that having one user’s modification affect adversely can be prevented. The process of controlling this is called concurrency control.
There are three common ways that databases manage data currency and they are as follows:
1. Pessimistic concurrency control – in this method, a row is available to the users when the record is being fetched and stays with the user until it is updated within the database.
2. Optimistic concurrency control – With this method, a row cannot be available to other users while the data is currently being updated. During updating, the database examined the row in the database to determine whether or not any change has been made. An attempt to update a record that has already been changed can be flagged as concurrency violation.
3. Last in wins – with this method, any row can never be available to users while the data is currently being updated but there is no effort made to compare updates with the original record. The record would simply be written out. The potential effect would be overwriting any changes that are being made by other concurrent users since the last refresh of the record.
Some relational database management systems have multi-version concurrency control. This works by automatically providing read consistency to a query which results in a situation where all data seen by query can only come from a single point in time, or a term known as statement level read consistency. Another read consistency is the transaction level read consistency. The RDMS uses the information stored in the rollback segments which contain old values of data that have been altered by recently committed or uncommitted transactions to get a consistent view. Both consistency and concurrency are closely related to each other in database systems.