# A caveat in using ‘ffmpeg’ to produce consumer-ready streams, from individual frame-files.

It recently happened to me, that I had used ‘Blender’ to create a very short animation, with 144 frames, but where I had instructed Blender to output this animation as a series of numbered .PNG-Files, which I would next use an ‘ffmpeg’ command, to compile into a compressed stream, the latter being an .MP4-File, using H.264 video compression. ( :1 )

But unexpectedly, I had obtained an .MP4-File, which would play on some of my player applications, but not on others. And when I investigated this problem, I found that player-applications which used a feature under Linux named ‘VDPAU‘, were Not able to play the stream, while player-applications which used software to decompress the stream, were able to play it.

The very first assumption that some people could make in such a situation would be, that they do not have their graphics drivers set up correctly, and that VDPAU may not be working correctly on their Linux-based computers. But, when I looked at my NVidia settings panel, it indicated that VDPAU support included support for H.264 -encoded streams specifically:

BTW, It’s not necessary for the computer to have an NVidia graphics-card, with the associated NVidia GUI, to possess graphics-acceleration. It’s just that NVidia makes it particularly easy, for users who are used to Windows, to obtain information about their graphics card.

Rather than to believe next, that VDPAU is broken due to the graphics driver, I began to look for my problem elsewhere. And I was able to find the true cause for the playback-problem. Ultimately, if we want to compress a stream into an .MP4-File, and if we want the recipient of that stream to be able to play it back, using hardware-acceleration, which is the norm for high-definition streams, then an ‘ffmpeg’ command similar to the one below would be the correct command:


ffmpeg -framerate 24 -i infile_%4d.png -an -vcodec libx264 -pix_fmt yuv420p -crf 20 outfile.mp4



But I feel that I should explain, how my first attempt to compress this stream, failed. It did not contain the parameter shown above, namely ‘-pix_fmt yuv420p‘. There are two WiKiPedia articles, which explain the subject of what ‘YUV’ means, and that may explain the subject better than I can, and that I recommend my reader read:

https://en.wikipedia.org/wiki/YUV

https://en.wikipedia.org/wiki/Chroma_subsampling

I am now going to paraphrase, what the above articles explain in detail.

The way video and static images are normally stored on a computer, they consist of the additive primary color-channels, Red, Green and Blue. But a combination of R, G, B values, can be transformed through Linear Algebra – using a matrix – into YUV representation, where the Y refers to the brightness information – aka Luminance – and where the U and V together state the saturation and hue of the color – aka Chroma. This YUV representation can easily be transformed back into RGB representation.

But, because human vision is slightly less-able to distinguish fine details in Chroma, than it is to distinguish fine details in Luma, in many lossy, compressed formats, the Chroma is only stored at half the resolution – or even less – than the resolution at which Luma is stored.

What ‘ffmpeg’ wants to do, is to store its pixels in YUV 4:4:4 format. What this means is, that the Chroma information will not be down-sampled, and will be stored at the same resolution, as the Luma information. Because it does not compress anything, this form of YUV encoding could also be viewed as pointless, since pixels could just as easily have been stored in RGB format, and taken up just as much data to store. ( :2 )

When a video stream has been compressed in YUV 4:4:4 pixel-format, certain software can still play it, because header information in the video file declares what the format is, and because software coded to run on the CPU, tends to be very versatile – even under Linux. But if we want the stream to be hardware-decompressed, then firmware which runs on the graphics chip, is less versatile, and requires that one specific sub-format be used. Hardware-decompression essentially requires, that the pixels of the stream be encoded with YUV 4:2:0 format. And the additional feature of my ‘ffmpeg’ command-line above, specifies that ‘ffmpeg’ should use this pixel-format.

Now further guesswork could take place, as to why ‘ffmpeg’ would choose YUV 4:4:4 format in the first place, and one reason could be, that ‘ffmpeg’ keeps certain formatting details from the image source. Because my image source was a series of .PNG-Files, those were lossless files stored in RGB format, and 4:4:4 may simply have been the equivalent with which the output could be encoded. But it needed to be encoded with YUV 4:2:0.

So, now that I have re-encoded my .MP4-File, it plays fine, even with hardware-acceleration. So my VDPAU install was never defective.

I suppose that a further question which the reader might ask could be, ‘Why Not encode using YUV 4:2:2?’ And the short answer would be, ‘Because in general, YUV 4:2:2 isn’t compatible with interlaced scan.’ What the above parameter actually tells ‘ffmpeg’ to do, is vertically, only use the Chroma information from every second scan-line, but produce output anyway, which in my case, was 1080p, because my input-images had 1920×1080 size. But then, the same pixel-format could have been used to create a 1080i stream, if the parameters were just edited slightly. And this way, the graphics hardware can do more or less the same thing, regardless of whether it is being made to play back, 1080p or 1080i video. ( :3 )

1: )

When first examining the final result, I saw that the output, .MP4-File, only had 140 frames, even though it was given 144 .PNG-Files as input. I’ve observed numerous times, that various software cuts short animations in this way. I do know that to be missing 4 frames, did not affect the quality of the result adversely, in this one particular example.

But the final answer as to how to avoid this discrepancy, had to do with the fact that an earlier version of this command-line, used the ‘-r‘ parameter, to set the frame-rate of the output-file, since that was truly a video file. Apparently, when using numbered frames as input, with the ‘-i‘ parameter, and up-to-date ‘ffmpeg’ versions, the correct thing to do is to use the ‘-framerate‘ parameter, to define those as having a frame-rate, as input. That way, I was able to produce an output, .MP4-File, that was missing no frames.

2: )

Also, this posting is not meant to suggest, that Chroma sub-sampling, is the only way in which H.264 compresses video. If that were the case, the resulting .MP4-Files should still be much larger than they are. But in fact, the H.264 Codec has many more ‘tricks’, to compress the stream, even after the Chroma has been down-sampled.

3: )

Actually, the documentation for ‘ffmpeg’ (current version) states, that the command-line parameter ‘-ilme‘ needs to be given, in order to force output to Interlaced format. However:

• Interlaced format is disfavored by ‘ffmpeg’, and
• This option only leads to pure Interlaced Scan, for MPEG-2 streams, not for H.264, when encoded with the ‘x264′ library.
• The ‘x264′ library allows for ‘MBAFF’ to be encoded, as an alternative to Interlaced Scan, and This external link explains how to do that.

In order to be slightly more correct, off the top of my head, I’d say that ‘Progressive Scan’ is a process in which a group of pixels is encoded as defined by two passes, the first pass stating a set of output-values at half vertical resolution, and the second pass refining what the first pass began, in a way that doubles vertical resolution. ( :4 ) But, because YUV 4:2:0 format is being used, the second pass will ignore Chroma completely, and will only refine the Luminance information, for the entire group of pixels. Since it was the working assumption to begin with, that the Human ability to make out resolution be half, for Chroma, of what it is for Luma, for the decoder to be able just to skip Chroma belonging to the second pass this way, may seem like a simplification, over what YUV 4:2:2 would require the decoder to do.

But by default, to be able to decode YUV back into RGB, requires that a full set of Luma and Chroma values be supplied. Hence, even though the same Chroma values are to be used twice, but two sets of Luma values are to be used, the entire group of pixels needs to have been received, before the entire group can be converted back into RGB representation. This is what I would expect guides the decoding of YUV 4:2:2.

When YUV 4:2:0 is being used with ‘Progressive Scan’, an underlying assumption is made, that the Chroma which was correct for the first pass, has no promise of being accurate, for the second pass. This differs from YUV 4:2:2, where the two sets of Chroma were averaged, implying legitimacy for decoding the second pass as well. And so what some decoders may do, is to receive a second pass, which contains signed modifications to be made for Luma, but then to apply this second pass, just by adding or subtracting a certain amount of ‘whiteness’, to or from a temporary set of output-RGB values, meaning, to add or subtract the same amount, from the R, G and B already formed from the first pass.

This is ultimately also different, from how ‘Interlaced Scan’ works, where either the odd or the even set of scan lines could be decoded, by itself.

What I suspect ‘MBAFF’ implies, is that a ‘top’ set of scan-lines is encoded, and a ‘bottom’ set of scan-lines, but separated in time, just like ‘Interlaced Scan’ had them. The difference in ‘MBAFF’ may be, that the two sets of scan lines share Chroma information in one specific ordering, which is updated fully, with each field / half-frame. That way, just as with ‘Interlaced Scan’, a reduction in the bit-rate follows from having to encode only ½ as many scan-lines in a given period of time, but the full resolution is assumed to be valid for Luma, while the Chroma is still at ½ resolution. Each half-frame encodes Luma and Chroma then.

Neither ‘MBAFF’ nor ‘Interlaced Scan’ should ever be used at 24 / 25 FPS, as that frame-rate is already slow enough. But when shooting for the speed which 50 / 60 FPS can bring, ‘MBAFF’ or ‘Interlaced Scan’ will avert the need for a doubled bit-rate. Whether ‘Progressive Scan’ should even be used at 50 / 60 FPS, depends on how important the conservation of the bit-rate was, counter to the loss in quality that resulted, whenever ‘Progressive Scan’ was not available.

dirk@Plato:~$cd ~/Videos dirk@Plato:~/Videos$ mediainfo Hello_Karen_2.mp4
General
Complete name                            : Hello_Karen_2.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 2.21 MiB
Duration                                 : 5 s 834 ms
Overall bit rate                         : 3 180 kb/s
Encoded date                             : UTC 1904-01-01 00:00:00
Tagged date                              : UTC 1904-01-01 00:00:00
Writing application                      : Lavf57.71.100

Video
ID                                       : 1
Format                                   : AVC
Format profile                           : High@L4
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 4 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 5 s 834 ms
Bit rate                                 : 3 177 kb/s
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 24.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.064
Stream size                              : 2.21 MiB (100%)
Writing library                          : x264 core 150
Encoding settings                        : cabac=1 / ref=3 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=7 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=250 / keyint_min=24 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=20.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00
Encoded date                             : UTC 1904-01-01 00:00:00
Tagged date                              : UTC 1904-01-01 00:00:00

dirk@Plato:~/Videos\$




4: )

The 2D, Discrete Cosine Transforms used, tend to be most efficient when they consist of 8×8 samples. But, because the Chroma is to be down-sampled 2x, what results in practice is Macro-Blocks, that have 16×16 pixels. Each Macro-Block consists of 2×2 Luminance-DCTs, plus 2×1 Chroma-DCTs (because the Chroma channel is a 2-component vector). It’s a desirable goal, for any one DCT to have as coefficients, entirely zeroes. If this happens, a corresponding single bit belonging to the Macro-Block changes value, and one whole DCT can be omitted from the encoded stream, so that considerable compression is realized. If ‘Progressive Scan’ is after all implemented as I described above, then the probability improves slightly, with which this can happen (because pixel-wise differences between odd and even scan-lines will often be more slight, than the pixel-values in any one scan-line).

Further, because the DCT-coefficients are CABAC-encoded, given the best-case scenario, simply achieving lower amplitudes, will also reduce the resulting bit-rate. I must admit however, that presently, I do not understand CABAC-encoding.

Alternatively, each Macro-Block can still be organized in such a way, that each Luminance-DCT maps its input-samples, directly to contiguous image-pixels. Further, if I was to command my ‘ffmpeg’ program to use ‘YUV 4:1:1′ Chroma sub-sampling, then something silly might happen, such as 32×32 pixel macro-Blocks. Something silly already happened to me, when this program applied ‘YUV 4:4:4′ sub-sampling, which should have also resulted in poorly-defined Macro-Blocks, and therefore, in less compression.

Dirk