Video Compression

A friend of mine once suggested that ‘a good way’ to compress a (2D) video stream would be, to compute the per-pixel difference of each frame with respect to the previous frame, and then to JPEG-Compress the result. And wouldn’t we know it, this is exactly how MJPEG works! However, the up-to-date, state-of-the-art compression schemes go further than to do that, in order to achieve smaller file-sizes, and are often based on Macroblocks.

Also, my friend failed to notice that at some point within 2D video compression, ‘reference frames’ are needed, which are also referred to sometimes as key-frames. These key-frames should not be confused however with the key-frames that are used in video editing software, 2D and 3D, to control animations which the power-user wants to create. Reference frames are needed within 2D video compression, if for no other reason, than the fact that given small amounts of error with which ‘comparison frames’ are decompressed, the actual present frame’s contents will deviate further and further from the intended, original content, beyond what is acceptable given that the stream is to be compressed.

The concept behind Macroblocks can be stated quite easily. Any frame of a video stream can be subdivided into so-called “Transform Blocks”, which are typically 8×8 pixel-groups, and of which the Discrete Cosine Transform can computed, in what would amount into the simple compression of each frame. The DCT coefficients are then quantized, as is familiar. Simply because the video is also encoded as having a Y’UV colour scheme, there are two possible resolutions at which the DCT could be computed, one for the Luminance Values, and the lower resolution, spanning the doubled number of pixels, for the Chroma Values. However, it is in the comparison of each frame with the previous frames, that ‘good’ 2D video compression has an added aspect of complexity, which my friend did not foresee.

The preceding frame is first translated in 2D, by a vector that is encoded with each Macroblock, in an estimation of motion on the screen, and only after this translation of the subdivided image by an integer number of pixels by X and by Y a sub-result forms, with which the per-pixel difference of the present frame is computed, resulting in per-pixel values that may or may not be non-zero, and resulting in the possibility that an entire Transform Block has DCT coefficients which may all be zeroes.

Continue reading Video Compression

Caveats when using ‘avidemux’ (under Linux).

One of the applications available under Linux, that can help edit video / audio streams, and that has been popular for many years – if not decades – is called ‘avidemux’. But in recent years the subject has become questionable, of whether this GUI- and command-line- based application is still useful.

One of the rather opaque questions surrounding its main use, is simply highlighted in The official usage WiKi for Avidemux. The task can befall a Linux user, that he either wants to split off the audio track from a video / audio stream, or that he wants to give the stream a new audio track. What the user is expected to do is to navigate to the Menu entries ‘Audio -> Save Audio’, or to ‘Audio -> Select Track’, respectively.

Screenshot_20190314_131933

Screenshot_20190314_132014

 

 

What makes the usage of the GUI not straightforward is what the manual entries next state, and what my personal experiments confirm:

  • External Audio can only be added from ‘AC3′, ‘MP3′, or ‘WAV’ streams by default,
  • The audio track that gets Saved cannot be played back, if In the format of an ‘OGG Vorbis’, an ‘OGG Opus’, or an ‘AAC’ track, as such exported audio tracks lack any header information, which playback apps would need, to be able to play them. In those cases specifically, only the raw bit-stream is saved.

The first problem with this sort of application is that the user needs to perform a memorization exercise, about which non-matching formats he may or may not, Export To or Import From. I don’t like to have to memorize meaningless details, about every GUI-application I have, and in this case the details can only be read through detailed research on the Web. They are not hinted at anywhere within the application.

(Updated 3/23/2019, 15h35 … )

Continue reading Caveats when using ‘avidemux’ (under Linux).

Deriving a workable entropy-encoding scheme, based on the official explanation of CABAC.

One of the subjects which I recently blogged about, is that when encoding video-streams, some Codecs use 8×8 sample Discrete Cosine Transforms, but as with many DCTs, the coefficients produced tend to be values, which would take up too much space to store, in a fixed-length format. And so a family of techniques which gets applied, is loosely referred to as ‘Entropy Encoding’, with the key idea being, that the Entropy Encoding used for compressed video, is different again, from the Entropy Encoding used for compressed audio. And the scheme used for video has as advantage, that the encoding itself is lossless. Apparently, there are two variants actually used with H.264-encoded videos, which some people group together as MPEG-4:

  1. An unspecified form of variable-length encoding,
  2. CABAC,

The latter of which promises better compression, at the cost of greater CPU-power required, both to encode and to decode. I’m going to focus on ‘CABAC’ in this posting. There is an official explanation for how CABAC works, which I will refer to. In order to understand my posting here, the reader will need to have read the documentation I just linked to.

From first impressions – yesterday evening was the first day on which I examined CABAC – I’d say that the official explanation contains an error. And I’ll explain why, by offering a version of Entropy-Encoding, which I know can work, based on the link above, but different from it:

  • Integers are meant to be encoded, that are “Binarized”.
  • The probability with which the first “Bin” has become (1) instead of (0) can be analyzed as described, resulting in a Context Model of one out of (0, 1, 2), as described.
  • The next four Bins may not have individual probabilities computed, only resulting in Context Models (3, 4, 5, 6) when they are (1) instead of (0), which override the Context Model that the first Bin would generate.
  • The resulting, one Context Model could be averaged over the previous Values.
  • Using As a Pair of values, the Context Model (from the previous values) which was just computed, And the (present) Integer Value, a look-up can take place in a 2-dimensional table, of which sequence of bits to use, to encode (both).
  • Because the decoder has chosen the integer value out of a known row in the same look-up table, it can also update the Context Model being used, so that future look-ups when decoding remain unambiguous.

The main problem I see with the official explanation is, that because up to 6 Context Models can be computed, each of which supposedly has its own probability, based on that, the lookup-table in which binary values (entropy encodings) are to be found, would effectively need to be a 6-dimensional table ! Officially, all the Context-Models found, have equal meaning. Software is much-more probable, which uses a 2D table, than software which uses a 6-dimensional table, although according to Theoretical Math, 6-dimensional tables are also possible.

But then, a property of Variable Length Coding which has been observed for some time, was that small integers, such as (0), (1) and (2), were assigned very short bit-sequences to be recognized, while larger integers, such as (16) or (17), were assigned recognizable bit-sequences, which would sometimes have been impractically long, and which resulted in poor compression, when the probability of the integer actually being (0), (1) or (2) decreased.

So, because we know that we can have at least one Context-Model, based on the actual, local probabilities, when the probabilities of very small integers become smaller, a series of entropy-encodings can be selected in the table, the bit-length of which can be made more-uniform, resulting in smaller encodings overall, than what straight Variable-Length Encoding would have generated, CABAC instead being adapted to probable, larger integers.

The fact will remain, that the smaller integers will require fewer bits to encode, in general, than the larger integers. But when the smallest integers become very improbable, the bit-lengths for all the integers can be evened out. This will still result in longer streams overall, as larger integers become more-probable, but in shorter streams than the streams that would result, if the encodings for the smallest integers remained the shortest they could be.

Continue reading Deriving a workable entropy-encoding scheme, based on the official explanation of CABAC.

3GP Video File Support

I have an Android phone app, which records videos to .3GP File format. The most important fact to remember about video file formats, is that the file-name extension identifies a container file, for which there may or may not be more than one video and audio CODEC supported.

Some users have reported problems, in getting 3GP Video Files to play, or to import, on Linux computers. This is probably the case, because this container format may in some cases store H.263 video and AMR audio. The problem here is in the fact that AMR audio is a proprietary CODEC, which cannot really be shipped for free for licensing reasons.

In my case I am in luck, because my app stores video compressed with the H.264 Video CODEC, and with the AAC Audio CODEC. That means that my video files will play and be imported into projects without effort.

I am sorry if this provides no relief to other users.

Dirk