Existing Data Quality Criteria

The most general sense, data quality is an indication of well the data is in terms of integrity and accuracy as it is stored in the data resource in order to meet the demand of business information. Other indicators of good quality data pertain to completeness, timeliness and format (though this may never apply anymore with today’s advanced data cleansing and transformation tools).

In a large data implementation, there may be various components working together as a system in order to smoothly and efficiently hand the processes of extracting, transformation and loading data from various data sources into the data warehouse as and share the transformed data to be used by the business intelligence system so that data consumers and high ranking company officials and decision makers can draw decision support from the information.

But no matter how sophisticated the data warehousing system or data resources are, there can never be an achievement of high quality data when in the first place, the data input is not accurate. As it is aptly put in the information technology world: "Garbage in is garbage out". To avoid this, there must be a means of ensuring that the input data is clean and of high quality and so there should be an existing set of data quality criteria.

Different organizations have different needs. In fact, it is common to find these days one single company which operates in different industries. Moreover, business organizations are operating not just in one location but in different locations spread throughout the planet. So, criteria for data quality could really vary depending on the business needs.

For instance, a business organization solely operating in the pharmaceutical industry needs data pertaining to chemicals, raw materials for medicines and labeling of pharmaceutical products with names that sound long and scientific. These data needs are in the addition to the general data needs such as name of staff, customers and business partners.

A data documentation would define all of the data associated with the pharmaceutical organization’s business rules, entities and the corresponding attributes. It should also very specifically define the conventions for name labeling of the products.

In comparison, a company operating in the mining industry which is implementing a data warehouse or any other kind of data resource would have a different set of existing data quality criteria. Perhaps some of their criteria would include all names pertaining to different compositions of mineral deposits, geographic locations of where specific minerals and other mine deposits can be found and how rich the deposits are, what the legislations are from one country to another in terms of mining contracts, and many other relevant data needs for the mining industry in general and the mining company in particular.

Having a set of existing data quality criteria is almost similar to keeping a data dictionary. But while a data dictionary is very broad as it tries to define all data and all of its general aspects, the existing data quality criteria is very specific. It may define how the data will be structured, and how it will be dealt with in terms of physical storage and network sharing.

But like the data dictionary, the existing data quality criteria can partially overcome the problem associated with data disparity from the sharing of different data formats from different data sources platforms. This will essentially complement the process of extracting, transforming and loading but its final effect will be clean, uniform and high quality data output in the data resource.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.

Pin It