What is Data Distribution

Often, data warehouses are being managed by more than just one computer server. This is because the high volume of data cannot be handled by one computer alone. In the past, mainframes were used for processes involving big bulks of data. Mainframes were giant computers housed in big rooms to be used for critical applications involving lots of data used as in processes involving census, consumer statistics, enterprise resource planning or financial transaction processing.

IT professionals have found more advantages with having a distributed computing method to process different parts of a program simultaneously on several computers connected on a network. With many computers working can exceed the processing speed of mainframe computers.

In today’s business environment, data management has become a critical part of a company. Many customers and company staff use a combination of laptops, desktops and terminal services. This has posed serious challenges to many data administrators to make user data available to data consumers in a consistent and timely fashion.

Making use of data distribution partially answers these challenges. Data distribution may at first give the impression of contradiction to data normalization which is a relational database process of eliminating redundant data to speed up computer processing and save company expenses by saving on expensive storage devices. Because data distribution is basically a way of replicating data, it could give the impression of defeating the purpose of database normalization.

In data distribution replicated data are stored in another computer database which are normalized. It could be a mirror of a database in another computer system. In distributed systems, several computers process a program using its own data from its own database and these computers continuously update each other to synchronize their data content. This is useful in several ways. On the one hand, sharing processing load could mean faster and more efficient computing.

On the other hand, employing a distributed computing system could mean less prone to failure. If one of the computers within the distributed system fails, another one can carry out the same load. In contrast to centralized computing as in the case of mainframes, if the mainframe computer breaks down, the whole system goes down as well.

As a result, business operations could come to a halt until the mainframe is fixed. Or the company might temporarily switch to non automated system while waiting from the mainframe computer to go up again. The toll on this non automated system could fall on the system administrator because his work could pile up due to the bulk of data accumulated during the non automated period.

Data could be transferred when it has reached a particular volume. This is sometime referred to as batch transferring but is more commonly known as data deployment. A bulk of data is distributed by batch from one site to another.
A data distribution service (DDS) is a specification of publish / subscribe middleware for distributed systems. Specifically, a DDS handle processes related to transfers of data including data addressing, data marshalling and de-marshalling. Any node within the system can be a publisher, subscriber or both at the same time.

Data distribution can be costly undertaking but when properly managed, it could be very beneficial to a company implementing a data warehouse.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It