Per-Scene Encoding in Test: Video Quality and Streaming Costs

Introduction

Per-scene encoding is a technique used to optimize the size of the encoded output by applying different encoder settings to different parts of the video (scenes). For this, Axinom Encoding uses Optimizer from our partner VisualOn. This AI-enhanced solution continuously analyzes content in real-time to determine the best transcoder settings for outstanding video quality and lower bitrates.

It's also known as CAE - Content-Aware Encoding or Content-Adaptive Encoding. We prefer the term per-scene encoding, because some CAE solutions analyze a to-be-encoded video as a whole and adjust the parameters once before starting the encoding process, while Axinom's per-scene encoding powered by VisualOn Optimizer does more - it actually applies different encoding settings to different scenes.

Per-scene encoding promises to reduce the output size by an average of 40% (and up to 70%) without compromising visual video quality. But how well it really works?

In this article, we tested per-scene encoding by applying optimization to the reference video with various options and compared the results. You can review the visual quality by the embedded side-by-side player showing the video with and without per-scene encoding.

Reference video - Tears of Steel

As a reference video for this benchmark we've taken Tears of Steel from Blender Foundation.

This 12 min long video has a good mixture of scenes with different complexity and motion. It's also free, so you can download it yourself and repeat all the experiments.

You can see the original Tears of Steel video also on YouTube.

Description of the Test

We've encoded the video first without optimization, and then with per-scene encoding using different target VMAF scores.

tip

Video Multimethod Assessment Fusion (VMAF) is an objective full-reference video quality metric developed by Netflix in cooperation with several universities. It predicts subjective video quality based on a reference and distorted video sequence. The metric can be used to evaluate the quality of different video codecs, encoders, encoding settings, or transmission variants.

Encoding with various parameters

Variant	Target VMAF	File size, Mb	% saving
Original	-	570,0	-
Per-Scene	99	455,0	20%
Per-Scene	98	296,5	48%
Per-Scene	95 (default)	187,6	67%

See per-scene encoding for instructions on how to enable per-scene encoding in Axinom Encoding.

Side-by-Side Playback

Here you can see and compare the encoded results.

On the left: non-optimized video, on the right: video optimized with VisualOn Optimizer with a VMAF target of 98 (size saving of 48%).

Left Video:

Right Video:

Per Scene, 95

Original

Measuring the quality

With the side-by-side player above, you can visually compare the quality of the video with and without per-scene encoding.

To measure the quality more objectively, we can use the VMAF metric.

Here we used FFmetrics to measure the VMAF scores.

FFMetrics is an open-source software to compute and visualize different visual quality metrics, including PSNR, SSIM, and VMAF.

The following graph shows the VMAF metric by frame computed by FFMetrics for the original video (green) and the optimized one (the other color). Click an image to zoom in, or open an image in a new tab for even better resolution.

You can type a frame number into the video player above to see the exact location in question.

You see, that the optimized video has a higher VMAF score almost all the time, and there are fewer exceptions with the growing target VMAF.

95 target VMAF

VMAF by frame, 95

98 target VMAF

VMAF by frame, 98

99 target VMAF

VMAF by frame, 99

Costs

Encoding with per-scene optimization costs a bit more due to additional computation needs. With Axinom Encoding, it's 40% more expensive than regular encoding.

However, this investment can be quickly earned back through lower traffic costs (primarily - CDN costs).

The images below illustrate this: with growing number of views, the savings on traffic costs will exceed the higher encoding costs at a break even point.

Costs overview

Beyond this point, customers save money by providing better user experience to end users!

Where exactly the break even is, depends on a number of factors, such as:

video representations included in the ABR (Adaptive BitRate) set
selected target VMAF score (and as a result - saving in percent of the output size)
CDN price
Average bitrate played by the users

We will show two calculations: for a single video as above and for a full-length ABR movie.

For simplicity, we assume a CDN price of 0,01€ per GB. Your CDN price may differ.

Single video

Let's assume, we encode the Tears of Steel video exactly as above, with a single bitrate.

It's a single HD representation of 12 minutes length.

Standard Encoding costs will be 12 x 0,022€ = 0,264€, surcharge due to per-scene encoding will be 40% of that, so 0,1056€.

To watch a video, users will need to transfer the whole file. The size of the file we know from the table above. Multiplying with the CDN price we get the traffic costs. Saving on traffic costs is a difference between the original and optimized traffic costs.

Dividing the saving on traffic costs by the surcharge for per-scene encoding we get the number of views needed to reach the break even.

Target VMAF	File size, Mb	% saving	Traffic cost, €	Saving, €	Views to break even
Original	570,0	-	0,008550€	-
99	455,0	20%	0,004550€	0,001150€	92
98	296,5	48%	0,002965€	0,002735€	39
95	187,6	67%	0,001876€	0,003824€	28

You can see, by the way, that the calculation does not depend on the video length. If a video is longer, the encoding costs grow proportionally, but so does the output file size. If your video is L minutes long and you use the bitrate of B = 6000 kbps (unoptimized), your output file size will be roughly S = 6000 * 60 / 8 / 1000000 * L GB.

(In our case, the formula gives 0,54GB, in reality we've got 0,57GB, which is close enough).

If p is the percentage of the output size saving, the formula for the break even is:

N = L * 0,022 * 0,4 / (S * p * 0,01) = L * 0,022 * 0,4 / (L * B * 60/8/1000000 * p * 0,01) = ... = 117.333 / (B*p)

A rule of thumb for 6000kbps: break even is at 20/p.

Full-length ABR video

Now let's take a more realistic scenario: a full-length movie (90 minutes) encoded for adaptive bitrate streaming (ABR) with 6 video representations: 2 HD streams and 4 SD streams.

Standard Axinom Encoding cost for this video would be 7,92€, surcharge of 40% for per-scene encoding - 3,17€.

On the streaming side, we now don't know exactly how many bytes the client will download, as it depends on the player and the network conditions.

At any point of type the client is streaming with a certain bitrate, where the maximum is 6000 kbps (our original quality). We assume, that the average bitrate is B = 2500 kbps (unoptimized). Per-scene optimization reduced the size off all representations proportionally, so we will get a new - optimized - bitrate. It's percentage depends on the target VMAF score and is the same as we had in our experiment above.

Knowing the bitrate, we can calculate the total traffic and it's costs using the formulas above. (In the table below, we've also added the saving percentages of 30%, 40%, and 70%).

Target VMAF	% saving	Avg bitrate, kbps	Traffic, GB	Traffic cost, €	Saving, €	Views to break even
Original	-	2500	1,688	0,016875€	-	-
99	20%	2000	1,350	0,013500€	0,0033750€	939
	30%	1750	1,181	0,011813€	0,0050625€	626
	40%	1500	1,013	0,010125€	0,0067500€	470
98	48%	1300	0,878	0,008775€	0,0081000€	392
95	67%	825	0,557	0,005569€	0,0113063€	281
	70%	750	0,506	0,005063€	0,0118125€	269

Again, the results do not depend on the video length, only on the average bitrate and the percentage of the output size saving.

Repeating similar calculations as above, we get to: N = 469.333/(B*p)

A rule of thumb for 2500kbps: break even is at 188/p.

tip

If you would like to get an underlying Excel spreadsheet for these calculations, please contact support.

Conclusion

The tests confirm that applying per-scene encoding can significantly reduce the output size without compromising the visual quality of the video.

This also translates to costs saving once your video views reach break even. The break even is inversely proportional to the unoptimized video bitrate (or average bitrate), percentage of file size saving due to per-scene optimization, and CDN price.

At the same time, both the video size and the video quality significantly depend on the target VMAF score. Depending on your video material and your requirements, you can optimize more towards small video size or towards higher video quality.

If you don't want to invest time into research, we would recommend using the default target VMAF score of 98.

Activating per-scene encoding is as easy as adding a single property to your encoding job or selecting per-scene encoding from a drop down in the processing profiles if you are using UI.

Introduction​

Reference video - Tears of Steel​

Description of the Test​

Side-by-Side Playback​

Measuring the quality​

95 target VMAF​

98 target VMAF​

99 target VMAF​

Costs​

Single video​

Full-length ABR video​

Conclusion​

See also​