Deriving a workable entropy-encoding scheme, based on the official explanation of CABAC.

One of the subjects which I recently blogged about, is that when encoding video-streams, some Codecs use 8×8 sample Discrete Cosine Transforms, but as with many DCTs, the coefficients produced tend to be values, which would take up too much space to store, in a fixed-length format. And so a family of techniques which gets applied, is loosely referred to as ‘Entropy Encoding’, with the key idea being, that the Entropy Encoding used for compressed video, is different again, from the Entropy Encoding used for compressed audio. And the scheme used for video has as advantage, that the encoding itself is lossless. Apparently, there are two variants actually used with H.264-encoded videos, which some people group together as MPEG-4:

  1. An unspecified form of variable-length encoding,
  2. CABAC,

The latter of which promises better compression, at the cost of greater CPU-power required, both to encode and to decode. I’m going to focus on ‘CABAC’ in this posting. There is an official explanation for how CABAC works, which I will refer to. In order to understand my posting here, the reader will need to have read the documentation I just linked to.

From first impressions – yesterday evening was the first day on which I examined CABAC – I’d say that the official explanation contains an error. And I’ll explain why, by offering a version of Entropy-Encoding, which I know can work, based on the link above, but different from it:

  • Integers are meant to be encoded, that are “Binarized”.
  • The probability with which the first “Bin” has become (1) instead of (0) can be analyzed as described, resulting in a Context Model of one out of (0, 1, 2), as described.
  • The next four Bins may not have individual probabilities computed, only resulting in Context Models (3, 4, 5, 6) when they are (1) instead of (0), which override the Context Model that the first Bin would generate.
  • The resulting, one Context Model could be averaged over the previous Values.
  • Using As a Pair of values, the Context Model (from the previous values) which was just computed, And the (present) Integer Value, a look-up can take place in a 2-dimensional table, of which sequence of bits to use, to encode (both).
  • Because the decoder has chosen the integer value out of a known row in the same look-up table, it can also update the Context Model being used, so that future look-ups when decoding remain unambiguous.

The main problem I see with the official explanation is, that because up to 6 Context Models can be computed, each of which supposedly has its own probability, based on that, the lookup-table in which binary values (entropy encodings) are to be found, would effectively need to be a 6-dimensional table ! Officially, all the Context-Models found, have equal meaning. Software is much-more probable, which uses a 2D table, than software which uses a 6-dimensional table, although according to Theoretical Math, 6-dimensional tables are also possible.

But then, a property of Variable Length Coding which has been observed for some time, was that small integers, such as (0), (1) and (2), were assigned very short bit-sequences to be recognized, while larger integers, such as (16) or (17), were assigned recognizable bit-sequences, which would sometimes have been impractically long, and which resulted in poor compression, when the probability of the integer actually being (0), (1) or (2) decreased.

So, because we know that we can have at least one Context-Model, based on the actual, local probabilities, when the probabilities of very small integers become smaller, a series of entropy-encodings can be selected in the table, the bit-length of which can be made more-uniform, resulting in smaller encodings overall, than what straight Variable-Length Encoding would have generated, CABAC instead being adapted to probable, larger integers.

The fact will remain, that the smaller integers will require fewer bits to encode, in general, than the larger integers. But when the smallest integers become very improbable, the bit-lengths for all the integers can be evened out. This will still result in longer streams overall, as larger integers become more-probable, but in shorter streams than the streams that would result, if the encodings for the smallest integers remained the shortest they could be.

Continue reading Deriving a workable entropy-encoding scheme, based on the official explanation of CABAC.

Identifying the container-file-format, separately from the Codec.

One of the facts which the public is well-aware of, is that Sound and Video are usually distributed in compressed form, through the use of a ‘Codec’, which stands for ‘Compressor / Decompressor’. What may still have some people confused though, is that there is a separate distinction in file-formats, which is the ‘Container File Format‘. The latter distinction is observed, when giving the file its filename-suffix, such as .MP3, .MPEG, .MP4, .OGG, .M4A, etc..

  • An .MP3-File will contain sound, compressed with the Codec: MPEG-2, Layer III
  • An .MPEG-File will contain video and sound, compressed with the Codecs: MPEG-2 or MPEG-1, And AC3 or MPEG, Layer III Audio (Hence, ‘MP3 Audio’ is allowed.)
  • An .MP4-File will contain video and sound, compressed with the Codecs: H.264 or MPEG-4, And AAC
  • An .OGG-File will mostly contain video and / or sound, compressed with the Codecs: Theora (video) And Vorbis (sound)

Finally, because the ‘AAC’ Sound Codec, which stands for ‘Advanced Audio Codec’, has qualities which have been found desirable outside its initial usage-scenario, for movie-making, just for Audio, there has been some possible confusion, as to how the users should name a container file, which contains only AAC-compressed audio, but no video. On my Linux-computers, I’m used to giving those files the filename-suffix ‘.M4A’ . Other people may at one time have been doing the same thing. But because the suffix was not widely recognized, Apple specifically, may have just started the trend, of just naming the container files ‘.MP4-Files’ again, even though they contain no video. This may simply have helped their customers understand the file-formats better.

The AC3 and AAC sound Codecs both offer directionality in the sound, which was useful for movies, but which will exceed the degree of directionality, that ‘MP3 Audio’ offers. And so, even though AAC offers small file-sizes, it has become popular for Music as well, because the way in which the Advanced Audio Codec compresses its sound is ‘so smart’, that listeners tend to hear very high-quality sound anyway.

Dirk

 

About Constant Bitrate Encoding

What most of us are used to when we encode an MP3 File, is that we can set a bitrate – such as 192kbps – and, the codec will produce an MP3 File with that bitrate. If that was all there is to it, we’d have Constant Bitrate encoding, aka ‘CBR’.

But in many cases, the actual encoding scheme is Variable Bitrate (‘VBR’), which has been modified to be Adaptive Variable Bitrate (‘AVBR’).

The way AVBR works, is that it nests the actual encoding algorithm inside a loop, with the premise that the user has nevertheless set a target bitrate. The loop then feeds the actual algorithm several quality-factors, to encode the same granule of sound, in multiple attempts, to find the maximum quality-factor, which does not cause the encoding to exceed the number of bits, which have been allocated for the algorithm to take up, in its encoding of 1 granule of sound.

This quality-factor is then also used, to produce output. And, in case the actual number of bits output are less than the allocated number of bits, the difference is next added to the number of bits that act as a target, with which the next granule of sound is to be encoded.

Encoding schemes that are truly CBR, are often ones which are not compressed, plus also perhaps ‘DPCM‘… Most of the other schemes, such as ‘MP3′ and ‘OGG’, are really AVBR or VBR.

(Updated 03/11/2018 : )

Continue reading About Constant Bitrate Encoding

3GP Video File Support

I have an Android phone app, which records videos to .3GP File format. The most important fact to remember about video file formats, is that the file-name extension identifies a container file, for which there may or may not be more than one video and audio CODEC supported.

Some users have reported problems, in getting 3GP Video Files to play, or to import, on Linux computers. This is probably the case, because this container format may in some cases store H.263 video and AMR audio. The problem here is in the fact that AMR audio is a proprietary CODEC, which cannot really be shipped for free for licensing reasons.

In my case I am in luck, because my app stores video compressed with the H.264 Video CODEC, and with the AAC Audio CODEC. That means that my video files will play and be imported into projects without effort.

I am sorry if this provides no relief to other users.

Dirk