Data Source, as the name implies provides data via data site. Data site in turn stores an organization's database, data files including non-automated data. Companies implement a data warehouse because they want a repository of all enterprise related data as well as a main repository of the business organization's historical data.
But such data warehouses need to process high volume levels of data with complex queries and analysis so a mechanism has to be applied to the data warehouse system in order to prevent slowing down the operational system.
A data warehouse is designed to periodically get all sorts of data from various data sources. One of the main objectives of the data warehouse is to bring together data from many different data sources and databases in order to support reporting needs and management decisions.
Let us take the United States Environmental Protection Agency which is implementing an Envirofacts Data Warehouse. Because this is such a large agency dealing with large volumes of data, the Envirofacts database is designed to be a system composed of many individual EPA databases and databases are administered by program system offices.
Sometimes, the industry required to report information to state where it operates and sometimes also, the information is being collected at federal level.
So the data sources of the Envirofacts Data Warehouse provide information that makes it easy to trace the origin of the information. Some of these data sources are:
Superfund Data Source – This data source are from Superfund sites which have those uncontrolled hazardous wastes sites designated by the federal to be cleaned up.
In this data source are stored information about these sites in the Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS), which has been integrated into Envirofacts.
Safe Drinking Water Information Data Source – This database stores information related to drinking water programs.
Master Chemical Integrator Data Source – This database integrates various chemical identifications used in four program system components.
Other data sources Envirofacts Data Warehouse are Hazardous Waste Data, Toxics Release Inventory, Facility Registry System, Water Discharge Permits, Drinking Water Microbial and Disinfection Byproduct Information and the National Drinking Water Contaminant Occurrence Database.
Now, all these data sources contribute seemingly unrelated data which may come in disparate files formats. This may also come from different geographical locations from different federal governments within the United States.
The data that they share finally converged in a central data warehouse which manages them so they become more meaningful and relevant to be redistributed or shared to anybody who needs them.
In a similar manner, business organizations implementing a huge data warehouse may have data sources coming from different departments. There may be a data source coming from the human resource department, another from the financial and accounting, yet another from manufacturing, inventory, sales and many other departments.
For really big companies which operate with various geographical locations around the country or around the world, the data sources may from more sources. Data sources may be divided in hierarchical fashion.
For instance, a data sources in one geographical branch may be broken down into various data sources coming from the different departments within the branch. In the overall global data warehouse system, the data sources from the atomic departments become like twigs in the global data warehouse tree structure.
The whole data warehouse system with different data sources make the whole system easy to manage because when of the data sources breaks down, the whole system will not halt in its operations.