Data Compression: What It Is and Why It's Important
Updated July 31, 2023
A person works on a desktop computer, compressing a file. There's also a list with the title "Best practices for data compression" and these items:
• Determine the compression level
• Choose the appropriate compression type
• Use a coprocessor
• Consider data deduplication
• Determine if you need multistage compression
Data storage and transmission are often important concerns for businesses, governments and other organizations. Compressing data allows these organizations to maximize the data they can handle while minimizing the associated space and cost. If you store or transmit data as a part of your occupation, it may be helpful to understand how compression works and what advantages it can provide to you.
In this article, we define data compression, discuss its importance, describe various compression methods and provide tips on how to implement compression.
What is data compression?
Data compression is the act or process of reducing the size of a computer file. Through an algorithm or a set of rules for carrying out an operation, computers can determine ways to shorten long strings of data and later reassemble them in a recognizable form upon retrieval. The result is a file that uses fewer bits, or units of information, than the original file. There are two types of data compression:
In lossless data, all the original data is intact. The algorithm reduces the file size in such a way that it retains the information needed to expand the file to its original size when decompressed. The lossless format is necessary for files that can't function or is noticeably compromised without all the original data. Such files include software applications, documents and certain media formats used by different people such as photographers, filmmakers and musicians.
Lossy compression can reduce file sizes even further but with some compromise in detail. This format is suitable for file types in which lost details are hardly perceptible. Such files include media on the user end, such as downloads of music, movies and images. For these, there's some reduction in playback quality, but the consumer is unlikely to notice.
How data compression works
There are generally four types of data compression based on the type of data you want to compress. These are:
Text: Text data compression involves identifying patterns and redundancies in the text and encoding them using shorter codes or symbols. The effectiveness of a compression algorithm depends on the type of text and the required compression ratio.
Image: Like text compression, image compression searches for patterns and redundancies in the file and create shorter codes and symbols. For example, it might recognize a repeated color pattern and create a code with the color and its number of occurrences that are the same size as one unit of the original.
Audio: Audio files are computer files you can listen to but don't often contain other types of data. The primary way to shrink audio files is through using lossy methods where background noise and white noise are both removed from the files.
Video: Video files may contain images and audio files and need a specific process to compress. Since videos use both images and audio files, a combination of lossy methods are used to eliminate background noise and keep the most important parts of the video.
Related: JPG vs. JPEG: Is There a Difference?
Why is data compression important?
Data compression minimizes the space that files occupy on a hard drive and reduces the time needed to transfer or download them. This reduction of space and time can result in significant cost savings. For example, organizations that store large amounts of data, such as corporations and healthcare providers, can save on data storage expenses, as compression allows them to store more files with less capacity. Also, since compressed files take less time to transmit across the internet, such organizations have less need for investing in costly bandwidth upgrades.
For certain other organizations, compression allows them to provide optimal service at the highest convenience. For instance, telecommunications providers handle enormous amounts of audio and video data. Compression allows them to provide service to a large number of customers with minimal compromise in auditory or visual quality.
Data compression methods
The following are some common data compression methods:
Lempel–Ziv compression is a lossless algorithm that finds repeated characters in a data set and replaces them with tokens or shortened sequences. For example, in a message that reads "AAABABAAABAA," the algorithm scans the message, stop at every unfamiliar letter sequence and assigns a token. The first unfamiliar sequence is the single "A," which might receive a token of "1." The next is "AA," which is "2." " B" is the third sequence, receiving a "3." The sequences thereafter are familiar to the algorithm. The algorithm can convert the original message to "123132132," a compression of almost 60%.
Related: What Is Data Management?
2. Run-length encoding
Run-length encoding is a lossless method that takes advantage of often-repeated strings, or runs, of repeating data. For example, if an image file includes a string of 10 consecutive pixels of the same color, the algorithm can insert a datum that reports there are 10 such pixels and then remove any redundant data. Though the algorithm adds some data, it removes much more, reducing the overall size of the file. This type of compression can also increase the size of your data since changes may occur frequently to the sequence of data.
3. Dictionary coding
Dictionary coding is another lossless method that converts the original data into an abbreviated numeric code using bits of 0s and 1s and then uses a "dictionary" as a reference to convert the code back to a recognizable form. This is comparable to a restaurant using numbers to represent various food combinations on its menu. For example, number one might represent "fried chicken with potatoes and peas." The description of the menu item takes up 36 characters, but the numeric code is only one. Here, the dictionary is the knowledge that a certain number stands for a specific dish.
With regard to computer files, imagine a 100-byte image file made up of two colors. The algorithm might divide the bytes into groups of 10 and use a three-digit code for each color. Each group of 10 bytes is like a menu item, and the dictionary is a legend that links each of them to a code. Thus, by replacing every 10 bytes with a three-digit string, the algorithm can produce a final compressed image size of only 30 bits. Upon retrieval of the file, it can then convert the bits back to their original form.
4. Perceptual coding
Perceptual coding is a lossy compression method that discards the parts of a file that most humans are incapable of perceiving. Depending on the file type, the algorithm can determine which elements of the file fit this description and subsequently reduce or remove its presence. For example, a raw music file might contain sound waves in the ultrasonic range, which people can't hear. Thus, the algorithm can completely remove any data that pertains to ultrasound, reducing the total file size significantly with no noticeable reduction in sound quality.
The same can apply to images and video. For the former, the algorithm can retain elements that human eyes normally perceive well, such as the contrast between objects, but reduce imperceptible components within the objects, such as pixels of similar colors. For the latter, the algorithm might reduce the transmission of pixels that are static between frames, such as stationary objects.
Best practices for data compression
Consider these tips for implementing data compression:
Determine the compression level
Depending on your needs, you may compress your data to a certain level. Before you start other processes, it's important to determine how compressed the data can be before you convert it. This helps you determine what other steps you may need for successfully compressing the data and sending it to its destination.
Choose the appropriate compression type
For every file you compress, first, determine whether it's lossless or lossy. To decide which to use, ask yourself whether any compromise in data quality is acceptable. As discussed, some detail loss in audio, video and image files is unlikely to be perceptible so lossy compression is appropriate for these. In files such as text documents, loss in detail is noticeable, so a lossless compression is advisable.
Use a coprocessor
A coprocessor allows your computer to redirect processing power to a secondary central processing unit (CPU), freeing up your primary computer resources to perform everyday activities. This allows you to remain productive while you compress files, which can be a resource-intensive function. Consider adding a field-programmable gate array, or FPGA, a microchip that you can configure to operate as an additional processor. This is especially useful for the compression of large data types.
Consider data deduplication
Data deduplication is a process that removes duplicates within a data set. It works by comparing patterns of data, identifying which patterns already exist within a stored set and replacing the redundant instances with a reference that directs to the already-stored pattern. Because such patterns may occur repeatedly in a given instance of data transmission or storage, deduplication can greatly reduce the amount of data handled. It's thus a useful complement to compression.
Determine if you need multi-stage compression
Sometimes, you may compress the data multiple times so that every file type converts correctly. This is multi-stage compression and you can start by determining whether your data has multiple file types such as videos, audio and text. Once you determine this, you can decide whether you need each type of data and then compress them.
Explore more articles
- How To Calculate P-Value in 3 Steps (With an Example)
- 14 Ways To Make Friends at Work (And To Do So Professionally)
- What is Qualitative Forecasting? Definition and Methods
- 12 Templates To Use When Writing Letters to Clients (With Tips)
- How To Cite Images in PowerPoint in 5 Steps
- What Is a RAID Log? Benefits, Components and Example
- How To Become a Doctor in 6 Steps: A Complete Guide
- How To Deal With Difficult Customers: A Complete Guide
- 8 Career Goals in Health Care Administration (With Tips)
- Top Work Ethic Skills (And 4 Tips To Improve Yours)
- 18 of the Highest-Paying Associate Degrees (With Salaries)
- 18 Creativity Exercises To Improve Creative Thinking at Work