Disparate Databases

Disparate Databases are heterogeneous databases, In such database systems are neither compatible electronically nor operationally.

Technically speaking, a database is repository of data that can provide for a centralized and homogeneous view of data to be used for multiple applications. The data in a database are not just randomly placed there but they follow a certain structure according to the definition of the database. This structure is called a schema which is specified in a data definition language and can be manipulated using operations specified in a data manipulation language.

Both the definition and manipulation algorithms employed in a database are being based on a data model which contains the definition of the semantics of the constructs and operations supported by these languages.

Comprehensive studies of molecular biology data often involve exploring multiple molecular biology databases, which entails coping with the distribution of data among these databases, the heterogeneity of the systems underlying these databases, and the semantic (schema representation) heterogeneity of these databases.

Early attempts to manage heterogeneous databases were based on resolving heterogeneity by consolidating these databases either physically, through integration into a single homogeneous database, or virtually, by imposing a common data definition language, data model, or even DBMS, upon heterogeneous databases. These attempts failed because they were requiring a very difficult to attain degree of cooperation and a costly replacement of applications that were already based on the existing databases.

A data warehouse is a prime example of where disparate databases may be working together to form an information system. As its very nature, a data warehouse is a repository of present and historical data of a company and these data will be used by intelligence systems. But a central database alone may not be able to handle the enormous requirements posed by high level volumes of data so different data may be coming from different data sources.   These data sources may be managed by different kinds of database implementations include relational database management systems. Some can even come from flat files from legacy systems.

From these different kinds of data sources come differences in structure, data semantics and constraints supported or query language. This could be because different database implement based on different data models which could provide different primitives such as object oriented (OO) models that support specialization and inheritance and relational models that do not. For instance, a database may be using the set type in CODASYL schema but this schema could support supports insertion and retention which could are not captured by referential integrity alone.

Disparate databases result in disagreement about meaning, interpretation or intended use of data. As such, some conflicts may arise which are related to naming conventions, data representation where database use different values to represent the same concept, precision conflicts in that databases may used the same data values from domains with different cardinalities for same data, metadata conflicts wherein the same concept may be represented at schema and instance level, missing attributes and many more.

Problems arising from the data results of disparate databases can be easily remedied today with many kinds of software tools that can manage data transformation. A popular process known as ETL (extract, transform, load) has become a standard in data warehouse application in order to manage disparate data from various sources and transform then into a uniform format that the data warehouse can understand and work with.

Aside from enterprise data warehouses, the internet is the biggest example of disparate databases being handles to give relevant information. In dynamic websites, when one looks at a webpage with a browser, the viewer may not have any idea at all that the data he is looking at comes from hundreds of disparate databases.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.

Pin It