What is Data Compression

Data Compression is a method using which the storage space required for storing data is reduced with the help of mathematical techniques.  Data compression is also referred to as source coding. This is the process of encoding data information using as few bits as possible compared to the unencoded data.

As a real life non digital analogy, the world "development" could be compressed as the word "dev’t or dev". Despite the few use of letters, the all three words give the same meaning to a person with the benefit of saving space on the computer and saving paper space and ink in printing.

In the more technical and mathematical sense, data compression is applying certain algorithms in order to reduce bits in a data file. Most computer software applications for compressing data use a variation of the LZ adaptive dictionary-based algorithm in reducing file sizes without changing the meaning of the data. "LZ" refers to the name of the creators of the algorithm, Lempel and Ziv.

Data compression is very useful in two main areas: resource management and data transmission. With data compression, consumption of expensive resources like hard disk can be greatly reduced. But the downside to this is that compressed data often needs extra processing for decompressing so extra hardware may be needed.

In terms of transmission, compressed data will help save bandwidth and as result, a company may not need to spend extra money for bandwidth. But as with any communication, a protocol need exists between the sender and receiver to get the message across.

There two main types of data compression namely lossless compression and lossy compression. As the name implies, lossy compression results in a lot of lost bits while the lossless compression may not remove bits but eliminate them but by changing them into data information with lesser demands for number of bits.

The lossless compression may let one recreate exactly the original file to be compressed while the lossy compression is based on the concept of break the file into smaller formats for storage and easy transmission and putting the parts back together at the target site after transmission.

In a lossless data compression for instance, a picture may have a nice blue sky but the file size is big and the user may want to reduce the file size without compromising the quality of the nice blue color. To make this possible, one has to change the color value for particular pixels. Because the picture has lots of blue, the program would then pick one color of blue and use it for every pixel. An algorithm will take care of this such that the file is rewritten in a manner where every sky pixel refers to the picked blue color so redundancy by using different pixels of different shades of blue is reduced.

On the other hand, lossy compression is very useful internet applications as the nature of the sending files over the internet is breaking a file into packets. The problem with lossy compression is that one could get stuck with the receiving application’s interpretation of the compression program from the source. Data that needs to be reproduced exactly like databases cannot use lossy compression. But the benefit of lossy compression is the big reduction in files size.

Some examples of lossless data compression include entropy encoding,

  • Burrows-Wheeler Transform,
  • Prediction by Partial Matching (also known as PPM),
  • Dictionary Coders (LZ77 & LZ78 and LZW),
  • Dynamic Markov Compression (DMC),
  • Run-length encoding and context mixing.

Examples of lossy data compression include vector quantization,

  • A-law Compander, Mu-law Compander,
  • Distributed Source Coding Using Syndromes (for correlated data),
  • Discrete Cosine Transform,
  • Fractal compression,
  • Wavelet compression,
  • Modulo-N code for correlated data and linear predictive coding.

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.

Pin It