Here’s What You Need to Know About Data Compression
What’s the difference between lossy and lossless compression? What about uncompressed? Learn the differences between each in this filmmaking guide.
Data compression is an unseen force of tremendous power in the modern world. Without the myriad innovations in the field of compression, there is a good chance that our modern computer age would have never taken off.
All types of data are compressible: audio, video, text files, pictures — you name it. Compression essentially takes an input data set and encodes it using fewer bits than the original file would have taken up on its own. This information must then be decoded before you can interact with it again. This encode/decode process vastly reduces the computational resources necessary to view and manipulate the data, thereby reducing the computational and bandwidth requirements at each stage after the encode.
So, with that out of the way, let’s take a look at the alternative to data compression.
Uncompressed data is stored exactly as it was recorded or input. Because of this bit-for-bit duplication, uncompressed data offers maximum data fidelity at the cost of maximum storage and bandwidth requirements.
Uncompressed data is a remnant of the early days of computing — even the most complex data sets used to be orders of magnitude smaller due to technological limitations.
How does it work?
Uncompressed data doesn’t usually implement special encoding techniques. Rather, each bit gets stored exactly as it comes from the input source. This reduces computational requirements for the input/capture device by offloading the complex task of playback or access to the end user’s device.
Uncompressed data is good for archival purposes because it doesn’t throw any information away, but the costs of storage will add up quickly.
So what can compression offer instead?
The early ’90s saw the first incarnations of compressed data. Perhaps the most well-known of these early formats was MPEG-1 Audio Layer 1, which was released in 1993. This framework would lead to the development of the famous MP3 audio codec.
MP3s, along with their early peers, were the first lossy file formats. As these formats gained traction, the industry coined a new term for them: “Codec” — short for Compressor/De-Compressor or Coder/Decoder.
How Does It Work?
Lossy codecs throw away a considerable amount of the information contained in the source file or input data stream.
Lossy codecs vastly reduce file size compared to uncompressed data sets. This is typically done by taking the input data set and reducing it to the closest approximate digital value. Different codecs have different bit depths — or the total number of possible digital values available for the interpretation of raw input data.
The best lossy codecs are designed around the perceptual limitations of humans. Most uncompressed data contains a large amount of information that our eyes and ears are incapable of perceiving. This means that a well-designed lossy codec can eliminate a large amount of the total information contained before any human would ever be able to notice.
Lossy codecs’ lightweight nature make them ideal for streaming and other live-broadcast applications.
Lossy codecs are the worst possible format for archival purposes due to the fact that a significant amount of information in the source data set disappears. While humans might not be able to discern a difference between the raw data and the first lossy encode of that data, as more lossy versions are made from the original lossy encode, the perceptual differences become more and more apparent.
Now, let’s enter the 21st century.
The first lossless codec was the “Fully Lossless Audio Codec,” or FLAC, and was released in 2000.
The development of lossless codecs marked a sea change in the world of data compression. Lossless codecs claim to offer exact or near-equal quality to uncompressed data, but with file sizes much closer to lossy compression. This is a result of compressing the input data in a specific way so that the decoder can later reconstruct the original data set.
Understanding the tech behind lossless compression can make your head spin if you go deep enough. YouTube tech darling Linus does a fantastic job of breaking down the basics in this video by Techquickie.
In the example Linus gives, the string “XXXOOXXX” is encoded to “3 O2 3”. When the string needs to be accessed again, “3 O2 3” is decoded back into “XXX00XXX”.
Due to its ability to reconstruct an uncompressed data set from a compressed one, lossless encoding has largely eclipsed uncompressed data in nearly every situation. Most master-quality video and audio codecs use some form of lossless compression.
The main trade-off for lossless compression is a slightly longer encode and decode time — much better than the drawbacks of lossy or uncompressed.
Losslessly compressed data is the gold standard for all data archiving. The compressed files can be archived and then decoded once they need to be accessed again, thereby preserving all data while also reducing costs of storage.
While all types of data compression offer some advantage over other types, lossless compression provides the most benefits with the least number of concessions to file size, mobility, or quality.
The next few decades will likely take lossless codec technology to new heights as older lossy formats end their useful lifespans.
Cover image via Dilok Kiatlertnapha.
Looking for more on digital filmmaking? Check out these articles.