Data Summarization summarizes evaluational data included both primitive and derived data, in order to create a derived evaluational data that is general in nature. Since the data in the data warehouse is of very high volume, there needs to be a mechanism in order to get only the relevant and meaningful information in a less messy format. Data summarization provides the capacity to give data consumers generalize view of disparate bulks of data.
Data summarization in very large multi-dimensional datasets as in the case of data warehouses is a very challenging work. This typically requires very intensive investigation to be done by IT experts, database administrators and programmers so that overall trends and important exceptions can be identified and dealt with technically. A computer, or several computer working together, can perform very exhaustive searches using highly sophisticated and complex algorithms to do the data summarization.
Data summarization is quite a common thing but may require a very powerful and time consuming approach in order to analyze ultra large datasets. For instance, when somebody want to do an investigation of census data so that he can understand the relationship between the salary and educational level in the United States, this can involve querying high volume databases and intensive data aggregation.
This can be presented in a compact summary with a plotting of the average salary level against educational level. But some other data consumers may have more requirements such the inclusion of standard deviation information along with the averages.
Yet some other data consumers may require breaking down the average salaries by age group or excluding outlying salaries. In addition, there may be those who will require the salary and education level relationship in the men and women, or by race or geography. Effective data summarization involves identifying overall trends and substantial exceptions to them.
Data summarization can also be done with a simple spreadsheet application such as Microsoft Excel. For example random sample can be collected such as three persons given three containers with different kinds of beverages, say, Coke, Pepsi, and Dr. Pepper. The beverage each person prefers is marked X.
With manual presentation of data, the result could be presented as P, P, C, P, P, P, P, P, P, C, P, P, D, and so. But that would be too confusing. With the use of computer programs, this could be easily summarized. And since most programs have visual interface, one can even get a graphical view like a chart or a bar graph, line graph and other graphical presentation formats.
There are many tools available in the market to make data summarization a lot easier by making it in visual environment. These tools may help a data consumer produce a data summary of the data one at a time and they can also allow the end user to explore the dataset manually. While the end user only clicks and drags, the computer is performing the exhaustive search at the back while informing the end user about where further investigation is warranted.
Data summarization makes it easy for business makers to spot trends and patterns in the industry where the business operates as well as trends and patterns in the internal operations of the business organization. This way, the decision makers can get accurate pictures of the strong and weak points in the operation.
When they get the details of these areas, the decision makers can make moves on how to optimize the strong points and how to innovate and improve products in order to overcome the problems associated with the weak points. Companies will definitely gain competitive advantage over other companies.