What is Data Normalization

Data Normalization is a process to develop the conceptual schema from the external schema.  In its very essence, data normalization is the process of organizing data inside the database in order to remove data redundancy. The presence of many redundant data can have very undesirable results which include significant slowing of the entire computer processing system as well as negative effect on data integrity and data quality.

The process of normalization includes the creation of tables and establishing various relationships between the tables. The relationships should be based on certain rules designed to protect data and ensure that the database is flexible by having no redundancy and inconsistent dependency.

Data normalization follows few rules and each rule is called a normal form.

The first normal form, denoted as 1NF, is geared towards eliminating repeating groups in individual tables. This can be done by creating separate tables for each set of related data and attributes and giving each set of related data in the table with a primary key. During this normalization form, multiple fields should not be used in a single table to store similar data.

For instance, in tracking an inventory item that may possible come from two different sources, an inventory record may contain separate fields for Vendor code 1 and Vendor code 2. This is not a good practice because when there is another vendor, it is not good to add a Vendor Code 3. Instead, a table should be created separately for all vendors and a link the table to inventory using an item number key.

The second normal form, indicated by 2NF, is geared towards elimination of redundant data. If an attribute depends only on part of a multi-valued key, then it has to be removed to a separate table. Second normal form can be achieved by creating separate tables for sets of values which apply to many records and then relating these tables using a foreign key.

As an example, let us take a customer’s address in an accounting system. Many other tables use this customer address such s orders, shipping, invoice, collections and account receivable tables. Instead of having to store the customer’s address as a separate data in each of those tables, a normalized form would be to store it in one place and have the other table link to it.

The third normal form, indicated by 3NF is geared towards the elimination of columns which are not dependent on a key. If an attribute does not contribute to a description of the key, then it has to be removed to a separate table. A value in a record which is not part of the key of the record should not belong to the table. Generally, whenever the contents of a group of fields may apply to more than one record in a table, it is good to place these fields in a separate table.

Let us take the case in an employee recruitment table wherein a candidate’s university name and address is indicated. But there is also a need for a complete list of universities to be used for group mailings. It is good to create a separate table for universities and link the table to candidates table with a code key for university because if the university information is stored in the candidates table only, there would be no way to list universities with current candidates.

Other data normalization forms include Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF – Isolating Independent Multiple Relationships), Fifth Normal Form (5NF – Isolating Semantically Related Multiple Relationships), Optimal Normal Form (ONF) and Domain-Key Normal Form (DKNF).

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It