Data Transformation

Data Transformation is also known as data scrubbing or data cleansing.  It is the formal process of transforming data in the data resource within a common data architecture.   In this process the production data is decoded and records are merged from multiple DBMS format in order to create meaning information.

It includes transforming operational, historical, disparate and evaluational data within a common data architecture to an integrated data resource.  It also includes transforming data within the integrated data resource, and transforming disparate data.

The IT environment in business organizations today is so diverse. Many organizations try to store their enterprise data in multiple relational database management systems (RDBMS) and among such popular RDBMS are Microsoft SQL Server, Oracle, Sybase, and DB2.

Aside from the fact that business organizations store data on relational database, there are also many other that store some of their data in non-relational formats such as mainframes, spreadsheets, and email systems. Still, there are organizations that have individual smaller databases such as Microsoft Access for each staff.

When all these scenarios are taken together in a single company (which is still highly possible today), the organizations will need to find an efficient way of still having to operate as a single entity and where all disparate data and systems can related and interchange data among various data stores.

Data transformation is one of the collective process known as ETL (extract, transform, load) which is one of the most important processes in data warehouse implementation this is the way that data actually gets loaded into the warehouse from different data sources.

There are many tools to help data warehouses with data transformations by setting objects or certain utilities to automate the processes of extract, transform and load operations to or from a database. With these tools, data can be transformed and loaded from heterogeneous sources into any supported database. These tools also allow the automation of data import or transformation on a scheduled basis with such features as file transfer protocol (FTP).

One such notable and widely used tool for data transformation is the Data Transformation Services (DTS) which contains DTS objects packages and many components. This tool is packaged with the Microsoft SQL Server but is also commonly used with any other independent databases. When used with Microsoft products, the DTS can allow data to be transformed and loaded from heterogeneous sources using OLE DB, ODBC, or text-only files, into any supported database.

The DTS packages are created with the use of DTS tools such as DTS wizards, DTS Designer, and DTS Programming Interfaces.

The DTS Wizards, like any other program wizards, automates things by offering simple clicks to accomplish complex tasks so that even non-programmers can do such complex jobs. But they mostly deal with common and simple DTS tasks including the Import/Export Wizard and the Copy Database Wizard.

DTS offers a graphical tool for building very sophisticated and complex DTS packages. The DTS Designer offers easy way to build the DTS packages with workflows and event-driven logic. It can be used in customizing and editing packages which have been created using the DTS wizard.

Other functionalities include are the DTS Package execution utilities, DTS Query Designer and DTS Run Utility.

Other than unifying and transforming data into a desired format for the data warehouse loading, data transformation is also responsible for correcting error by using a background task in order to periodically inspect the memory for errors and by doing such, it reduces the by using a background task in order to periodically inspect the memory for errors. It also minimizes or totally eliminate data redundancy.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It