Everything You Ever Wanted to Know About Compression Ratios
We’re here to answer some questions — what are compression ratios, how do they affect digital filmmaking, and what do they have to do with codecs?
In this article, we will de-mystify the cryptic compression ratio, break down how you can derive useful meaning from it, and then show you a few tricks for evaluating codecs to determine the best option for your production.
Basics of Data Compression
We’ve covered the basics of compression before, so we’ll blow through them quickly here.
All compression breaks down into one of two types: lossy compression (which discards information for the sake of file size or data rate), or lossless compression (which temporarily compresses data during the encoding process to enable the full or near-full recreation of the uncompressed data set on decode). Footage recorded without the use of any compression algorithm is considered uncompressed.
Now, we need to cover some computer science 101 before delving into compression ratios. (It’ll be quick, I promise.)
The fundamental particle of the information world is called the “bit,” represented by the lowercase “b.” (Yes, the case is important). At this level, information is in its most basic, binary form — a 1 or 0.
8 bits comprise a “Byte” (pronounced “bite”), represented by an uppercase “B.” At this and every further level, the data represented becomes more complex.
1,000 bytes make a KiloByte. This is not to be confused with the “Kilobit” (“Kb”), which is 1,000 bytes. Because bytes are 8-bit units, a KiloByte is actually 1024 bits.
One thousand KiloBytes make a MegaByte, or MB. (Again, not to be confused with the “Megabit” — “Mb.”)
This trend continues — a thousand MegaBytes makes a Gigabyte, and so on, but this is about as far as we need to go for this article. If you want to know more, WhatsAByte.com is a fantastic resource.
Now, let’s dive into compression ratios.
Compression ratios are a simple numerical representation of the “compression power” of specific codecs or compression techniques. They are an invaluable shorthand because they offer a vastly simplified description of the quality of the resulting data, footage, or audio you intend to compress.
So What Are They?
The two numbers in the compression ratio refer to the compressed vs. uncompressed size of the data. The first number represents compression power where the second (usually just “1”) refers to the total size of the uncompressed data.
If you ever want to find the compression ratio for any data you are compressing, here is the formula: Compression Ratio = Uncompressed size/compressed size
If you need to know the storage savings granted by a given codec, two simple adjustments to the formula and you’re set: Space savings = 1 – (compressed size/uncompressed size)
So a 10MB file compresses down to 2MB using codec X, giving us the compression ratio 5:1. To find the savings, we simply input our values into the formula.
Space savings = 1 – (2/10) -> = 1 – (.2) -> = .08 -> .08*100 = 80
So, codec X offers us a storage savings of 80 percent over the uncompressed data. Pretty nifty.
So now what?
Deciding on a Codec
Now that we have the basics covered, how do you decide which codec is best for your project? Let’s take a look at the parameters engineers use when developing compression algorithms, but let’s approach it as shooters and editors.
Questions to ask about yourself about the project:
- Speed: What is the project’s timeline?
- Compression ratio: Do you need higher-quality or smaller files?
- Complexity: Will additional codecs create unnecessary complexity?
- Space: Can you effectively capture, back up, and archive what you need?
- Latency: Are you going to be playing back in real time?
- Interoperability: Will the codec require transcoding for your editing system?
Now that we’ve got an idea of the specific needs of our production, what else do we need to do before choosing a codec?
Beyond evaluating the compression power of a codec, we can use everything we’ve learned so far to make storage predictions for the data we’ll be compressing for the entire shoot. There are a host of benefits to doing this — from choosing between two similarly classed codecs to knowing how many hard drives you’ll need for backup and archiving.
Let’s say we’ve evaluated our production’s needs, and we’re leaning toward recording video using either ProRes 422 HQ or DNxHD 145 for our 1920×1080, 29.97 frames per second project. At this resolution and frame rate, ProRes 422 has a data rate of 220Mbps (Mega bits per second) while Avid’s DNxHD’s is 145Mbps.
So, using some simple math we can predict how big our 1-hour interview clip will be before we ever start rolling.
220Mbps = 220,000,000 bits per (/) second
220,000,000 bits/second * 60 = 13,200,000 bits/minute
13,200,000 bits/minute * 60 = 792,000,000,000 bits/hour.
792,000,000,000 bits/hour / 8 = 99,000,000,000 Bytes/hour
99,000,000,000 Bytes / 1,000 = 99,000,000 MegaBytes/hour
99,000,000 MegaBytes / 1,000 = 99 gigabytes / hr
145Mbps = 145,000,000 bits per (/) second
145,000,000 bits/second * 60 = 8,700,000,000 bits/minute
87,000 bits/minute * 60 = 522,000,000,000 bits/hour.
522,000,000,000 bits/hour / 8 = 65,250,000,000 Bytes/hour
65,250,000,000 Bytes / 1,000 = 65,250,000 MegaBytes/hour
65,250,000 MegaBytes / 1,000 = 65.25 gigabytes / hr
So, our one-hour interview will result in a file that is roughly 99 gigs with ProRes 422 HQ, and about 65gb for DNxHD 145.
Now our choice is simple. We simply go back to the questions we asked ourselves a moment ago about our specific production to decide if the ~35 GB/hour savings of DNxHD is more or less important than the approximate 50% increase in data rate 422 HQ gives us.
Is our one-hour interview for a 30-second web commercial? If so, DNxHD should offer near-equal image quality to 422 HQ, but it will take up 40 percent less storage once completed — making it the clear winner in this case.
What if the interview is just one of several dozen for a feature-length documentary that you plan to shop around the festival circuit? In this case, you must place the premium on maximizing image quality over storage (within given parameters), and the 50 percent higher data rate of ProRes 422 HQ fits the need perfectly.
With just a little basic knowledge of the underlying science behind compression techniques used in modern codecs, we can assess the needs of our production, vet codecs for the production’s needs, and then make an educated decision based on the scope of the project. Pretty handy stuff if you ask me.
Cover image via kayan_photo.
Looking for more information on data and digital filmmaking? Check out these articles.