Denormalized Data

In any database implementation, it is always advised to normalize the database for optimal performance by reducing redundancy and maintaining data integrity. The process of normalization involves putting one fact in its appropriate place to make updates optimized at the expense of data retrievals. As opposed to having one fact in different places, a normalized is faster to retrieve.

A normalized database has one fact in one place and all related facts about a single entity are being placed together so that each column of an entity would refer non-transitively to only the unique identifier for that said entity. A normalized data should have undergone the first three normal forms generally.

However, there are cases that performance can be significantly enhanced in a denormalized database physical implementation. The process of denormalization involves putting one fact in many places in order to speed up the data retrieval at the expense of the data modification process.

Although not all the time recommended denormalization, in the real world implementation is sometimes very necessary. But before jumping into the process of denormalization, certain questions need to be answered.

Can the database perform well without having to depend on denormalized data?
Will the database perform well when used with denormalized data?
Will the system become less reliable due to the presence of denormalized data?

The answers will obviously give you the decision whether to denormalize of not.

If the answer to any of these questions is "yes," then you should avoid denormalization because any benefit that is accrued will not exceed the cost. If, after considering these issues, you decide to denormalize be sure to adhere to the general guidelines that follow.

If you have enough time and resources, you may start actual testing and comparing a set of normalized table and another set of denormalized ones. Then load the denormalized set of tables by querying the data in the normalized set and then inserting or loading into the denormalized set. Have the denormalized set of tables in read-only restriction and achieve a noted performance. But it should be strictly controlled and scheduled in the population process in order to synchronize both the denormalized and normalized tables.

Base on the exercise about, it is obvious that the main reason for denormalizing is when it more important to have speedy retrieval than having a slower a data update. So if the database you are implementing caters to the needs of an organization which has more frequent data updates, say, a very large data warehouse where data comes and goes from data source to another which makes it obvious that there are more updates than retrievals from data consumers then a normalization should be implemented without having to deal with denormalized data. But when the retrieval from data consumers is more frequent, say, a database that maintains less dynamic accounts but more frequent access because of a service exposed on the internet, then having to deal with denormalized data populated in the database should be the way to go.

Other reason for denormalization may include situation when there are many repeating groups existing which need to be processed in group instead of individual manner; there many calculations which will be involved on one or several columns before a query can be addressed; there is a need for many ways to access tables during the same period; and when some columns are queried with a large percentage of time. Denormalized data, although many have the impression of being the cause of a slow system, are actually useful in certain conditions.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It