What is Data Repository

Data Repository is a logical (and sometimes physical) partitioning of data where multiple databases which apply to specific applications or sets of applications reside. For example, several databases (revenues, expenses) which support financial applications (A/R, A/P) could reside in a single financial Data Repository.

A database warehouse is one large Data Repository of all business related information including all historical data of the business organization implementing the data warehouse. Data warehousing is a complex process of building a data repository in the form of a relational database so that the company can support web or text mining in order to leverage data and transform or aggregate them into useful information.

In all cases, organizations use data warehousing to gain a competitive advantage, support for decision making processes through comprehensive data analysis.

Some of the key components of data warehousing are Decision Support Systems (DSS) and Data Mining (DM).

Data volumes in data warehouse could grow at an exponential rate so there should be a way to handle this tremendous growth. With respect to storage requirements, the critical needs that need to be seriously considered in a data warehouse are high availability, high data volume, high performance and scalability, simplification and usability and easy management.

Partitioning of data into a logical or in some cases physical Data Repository could greatly help meet the requirement in relation to dealing with the exponential growth of data volumes in the data warehouse. If all the data in the data warehouse were not partitioned into several Data Repositories, then there will be profound disadvantage in terms of perfomance and efficiency.

For one, if the central server fails, the system would come to a halt. This is because data is just located in one monolithic system, and when the hardware fails, there is no sort back up. It may take some time to get the server up, depending on the nature of the problem. But in a business company, even a few minutes of business stoppage can already translate into thousands of potential dollars lost from the business.

When Data Repository is employed in the data warehouse, the load can be distributed across many databases or even across many servers. For instance, instead of having one computer handle the database related to customers, several databases could be handling the different aspects of customers.

In a very large company such as a company that has several branches around the country, instead of having all the customers in one database, several databases may be handling different branch customer database in a data repository. Or as earlier mentioned, several company departmental database may be broken down into various Data Repository such as one data repository supporting several databases (revenues, expenses) which support financial applications (A/R, A/P) could reside in a single financial Data Repository.

Data Repository offers easier and faster access due to the fact that related information are, to some degree, lumped or clustered together. For instance, in the example with financial Data Repository, anybody from the financial department or any other data use wanting information related to financials will not have to dig through the entire volume of the data in the data warehouse.

For database administrators, employing Data Repository means a lot easier way to maintain the data warehouse system because of the compartmentalized nature. When there is problem within the system, it may be easy to trace the cause of the problem without having to use a top down approach for the whole data warehouse. In most companies, one database manager or administrator is usually assigned to one data repository to ensure data reliability for the whole system.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It