What is Data Cardinality

In the implementation of a structure query language (SQL), the term data cardinality is used to mean the uniqueness of the data values which are contained in a particular column, known as attribute, of a database table.

There are actually three types of data cardinality each dealing with columnar value sets. These types are high-cardinality, normal-cardinality, and low-cardinality.

High data cardinality refers to the instance where the values of a data column are very uncommon. For example, a data column referring to values for social security numbers should always be unique for each person. This is an example of very high cardinality. Same goes with email address and user names. Automatically generated numbers are of very high data cardinality. For instance, in a data table column, a column named USER-ID would contain values starting with an automatically increments every time a new user is added.

Normal data cardinality refers to the instance where values of a data column are somewhat uncommon but never unique. For example, a CLIENT table having a data column containing LAST_NAME values can be said to be of normal data cardinality as there may be several entries of the same last name like Jones and may other varied names in one column. At close inspection of the LAST_NAME column, one can see that there could be clumps of last names side by side with unique last names.

Low data cardinality refers to the instance where values of a data column are not very unusual. Some table columns take very limited values. For instance, Boolean values can only take 0 or 1, yes or no, true or false. Another table columns with low cardinality are status flags. Yet another example of low data cardinality is the gender attribute which can take only two values – male or female.

Determining data cardinality is a substantial aspect used in data modeling. This is used to determine the relationships

Several types of cardinality defining relationships between occurrences of entities on two sides of the line of relationships exist.

The Link Cardinality is a 0:0 relationship and defined as one side does not need the other to exists

The Sub-type Cardinality is a 1:0 relationship and defined as having one optional side only.

The Physical Segment Cardinality is 1:1 relationship and it is demonstrated that both sides of the relationship are mandatory.

The Possession Cardinality is a 0:M relation (zero to many) relationship on both sides.

The Child Cardinality is a 1:M mandatory relationship and is one of the most common relationships used most databases

The Characteristic Cardinality is a 0:M relationship which is mandatory on both sides.

The Paradox Cardinality is 1:M relationship which is mandatory to one side. An example would be a person table and citizenship table relationship.

The Association Cardinaltiy is a M:M (many to many) relationship which may be optional on both sides.

A data table’s cardinality with respect to another data table is one of the most critical aspects in database design. For instance, a database hospital may have separate data tables used to keep track patients and doctors so a many to one relationship should be considered by the database designer. If the data cardinality and relationships are not designed well, the performance of a database will greatly suffer.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.

Pin It