The vertical fragment approach to database distribution is similar in concept to the horizontal fragment approach, but it does not lend itself as easily to scalability. Vertical fragments occur when columns, instead of rows, are distributed across multiple nodes.
A situation that calls for vertical fragments might arise if a table contains information that is pertinent, separately, to multiple applications. Using the previous example of a database that stores customer information, we might imagine an airline’s frequent flyer program.
These programs typically track, among other things, customers’ personal information, like addresses and phone numbers, along with a list of all the trips they have flown and the miles they have accrued along the way.
These sets of data have different applications: the customer information is used when mailing tickets and other correspondence, and the mileage information is used when deciding how many complimentary flights a customer may purchase or whether the customer has flown enough miles to obtain “elite” status in the program. Since the two sets of data are generally not accessed at the same time, they can easily be separated and stored on different nodes.
Since airlines typically have a large number of customers, this distribution could be made even more efficient by incorporating both horizontal fragmentation and vertical fragmentation. This combined fragmentation is often called the mixed fragment approach.
A database can be broken up into many smaller pieces, and a large number of methods for doing this have been developed. A simple web search for something like “distributed databases” would probably prove fruitful for further exploration into other, more complex, methods of implementing a distributed database. However, there are two more terms with which the reader should be familiar with respect to database fragmentation.
The first is homogeneous distribution, which simply means that each node in a distributed database is running the same software with the same extensions and so forth. In this case, the only logical differences among the nodes are the sets of data stored at each one. This is normally the condition under which distributed databases run.
However, one could imagine a case in which multiple database systems might be appropriate for managing different subsets of a database. This is called heterogeneous distribution and allows the incorporation of different database software programs into one big database. Systems like this are useful when multiple databases provide different feature sets, each of which could be used to improve the performance, reliability, and/or scalability of the database system.
In addition to the distribution situations above, full-database replication is also available for many database platforms. This is really what we mean when we say a database is hosted by multiple servers, but in general, the idea of distributing pieces of a database should be considered before putting much thought into wholesale replication of a database. This is for one simple reason: replication is expensive. It’s expensive in terms of finance, time, and data, but for many applications, it truly is the best solution.
Here, we briefly discuss “master-master” replication, which is perhaps the most complicated of all the replication solutions. This is also the most comprehensive replication solution, since each master always has a current copy of the database. Because of this, the entire database will still be available if one node fails.
Next Page: The Three Expenses in Distributed Databases