What is Data Collection

A database can be vast shared collection composed of data which are logically related to each other. Businesses rely heavily on data as they are Databases are used for managing the business day to day tasks so Data Collection happens every single day.

Collection of data may seem a simple and trivial task. But databases have gone a long way from simply being able to define, create, maintain and control data access. Today most complex applications cannot function without data and database managers. And Data Collection is one of the most critical tasks to handle by companies and their IT staff.

Two popular approaches to constructing database management systems emerged in the 1970s. The first approach was exemplified by IBM involving a data model which requires that all data records are assembled into a collection called Trees.

As a consequence, some records were roots while all others had unique parent records. An application programmer is permitted to query and navigate from the root to the record of interest one at a time. This process was rather slow but at the time, records were stored on serial storage devices particularly magnetic tapes.

The other approach at the time was the Integrated Data Store (IDS) developed from General Electric. This approach led to a new development of a new kind of database system called the Network Database Management System (DBMS).

This database was designed to represent more complex data relationships compared to those represented by the Hierarchical Database Systems like that of IBM’s. But still, query navigation involved moving from a specific entry point to the record of interest.

Today’s dominant databases, if not all, are based on the relational database model proposed by E. F. Codd. This design tried to overcome the shortcomings of the previous databases like the data inefficient data retrieval scheme.

With relational databases, data is represented in table structures which are called relations and access to these data is through high level and non-procedural query language done in a declarative manner.

The problem with previous database involving algorithms which obtain desired records one at a time has been overcome with having to specify only a predicate that identifies the desired records of combination of records in relational databases.

A Relational Database Management System (RDBMS) has a query optimizer to interpret the predicated specification into a process to access the database to solve the query. Relational databases maximize data independence and minimize redundancy.

Today’s business data warehouses are sometimes regarded as islands of databases. They are geographically separated and have incompatible hardware architectures and communication protocols. But this can be held seamless together in Data Collection with Distributed Database Management Systems.

A Distributed Database Management System (DDBMS) is a single database which is split into several fragments which are stored on several computers under the control of separated database management systems. These computers are connected on a network. Each computer has local autonomy but they can also process data stored on other computers within the network.

Software applications are specifically written to tie these autonomous DBMS together. Local applications manage data that are not from other sites while global applications manage data from other sites. Data Collection can be made seamless by the application despite the geographic distance between two or more systems.

Big and competitive companies invest money for Data Collections that can incorporate advanced numeric and text string searches, table handling methods, relational navigations through pages, and user defined rules to help spot relationships between data and elements.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It