Identifying the container-file-format, separately from the Codec.

One of the facts which the public is well-aware of, is that Sound and Video are usually distributed in compressed form, through the use of a ‘Codec’, which stands for ‘Compressor / Decompressor’. What may still have some people confused though, is that there is a separate distinction in file-formats, which is the ‘Container File Format‘. The latter distinction is observed, when giving the file its filename-suffix, such as .MP3, .MPEG, .MP4, .OGG, .M4A, etc..

  • An .MP3-File will contain sound, compressed with the Codec: MPEG-2, Layer III
  • An .MPEG-File will contain video and sound, compressed with the Codecs: MPEG-2 or MPEG-1, And AC3 or MPEG, Layer III Audio (Hence, ‘MP3 Audio’ is allowed.)
  • An .MP4-File will contain video and sound, compressed with the Codecs: H.264 or MPEG-4, And AAC
  • An .OGG-File will mostly contain video and / or sound, compressed with the Codecs: Theora (video) And Vorbis (sound)

Finally, because the ‘AAC’ Sound Codec, which stands for ‘Advanced Audio Codec’, has qualities which have been found desirable outside its initial usage-scenario, for movie-making, just for Audio, there has been some possible confusion, as to how the users should name a container file, which contains only AAC-compressed audio, but no video. On my Linux-computers, I’m used to giving those files the filename-suffix ‘.M4A’ . Other people may at one time have been doing the same thing. But because the suffix was not widely recognized, Apple specifically, may have just started the trend, of just naming the container files ‘.MP4-Files’ again, even though they contain no video. This may simply have helped their customers understand the file-formats better.

The AC3 and AAC sound Codecs both offer directionality in the sound, which was useful for movies, but which will exceed the degree of directionality, that ‘MP3 Audio’ offers. And so, even though AAC offers small file-sizes, it has become popular for Music as well, because the way in which the Advanced Audio Codec compresses its sound is ‘so smart’, that listeners tend to hear very high-quality sound anyway.



Understanding ADPCM

One concept which exists in Computing, is a primary representation of audio streams, as samples with a constant sampling-rate, which is also called ‘PCM’ – or, Pulse-Code Modulation. this is also the basis for .WAV-Files. But, everybody knows that the files needed to represent even the highest humanly-audible frequencies in this way, become large. And so means have been pursued over the decades to compress this format after it has been generated, or to decompress it before reading the stream. And as early as in the 1970s, a compression-technique existed, which is called ‘DPCM’ today: Differential Pulse-Code Modulation. Back then, it was just not referred to as DPCM, but rather as ‘Delta-Modulation’, and it first formed a basis for the voice-chips, in ‘talking dolls’ (toys). Later it became the basis for the first solid-state (telephone) answering machines.

The way DPCM works, is that instead of each sample-value being stored or transmitted, only the exact difference between two consecutive sample-values is stored. And this subject is sometimes explained, as though software engineers had two ways to go about encoding it:

  1. Simply subtract the current sample-value from the previous one and output it,
  2. Create a local copy, of what the decoder would do, if the previous sample-differences had been decoded, and output the difference between the current sample-value, and what this local model regenerated.

What happens when DPCM is used directly, is that a smaller field of bits can be used as data, let’s say ‘4’ instead of ‘8’. But then, a problem quickly becomes obvious: Unless the uncompressed signal was very low in higher-frequency components – frequencies above 1/3 the Nyquist-Frequency – a step in the 8-bit sample-values could take place, which is too large to represent as a 4-bit number. And given this possibility, it would seem that only approach (2) will give the correct result, which would be, that the decoded sample-values will slew, where the original values had a step, but slew back to an originally-correct, low-frequency value.

But then we’d still be left with the advantage, of fixed field-widths, and thus, a truly Constant Bitrate (CBR).

But because according to today’s customs, the signal is practically guaranteed to be rich in its higher-frequency components, a derivative of DPCM has been devised, which is called ‘ADPCM’ – Adaptive Differential Pulse-Code Modulation. When encoding ADPCM, each sample-difference is quantized, according to a quantization-step – aka scale-factor – that adapts to how high the successive differences are at any time. But again, as long as we include the scale-factor as part of (small) header-information for an audio-format, that’s organized into blocks, we can achieve fixed field-sizes and fixed block-sizes again, and thus also achieve true CBR.

(Updated 03/07/2018 : )

Continue reading Understanding ADPCM

About Constant Bitrate Encoding

What most of us are used to when we encode an MP3 File, is that we can set a bitrate – such as 192kbps – and, the codec will produce an MP3 File with that bitrate. If that was all there is to it, we’d have Constant Bitrate encoding, aka ‘CBR’.

But in many cases, the actual encoding scheme is Variable Bitrate (‘VBR’), which has been modified to be Adaptive Variable Bitrate (‘AVBR’).

The way AVBR works, is that it nests the actual encoding algorithm inside a loop, with the premise that the user has nevertheless set a target bitrate. The loop then feeds the actual algorithm several quality-factors, to encode the same granule of sound, in multiple attempts, to find the maximum quality-factor, which does not cause the encoding to exceed the number of bits, which have been allocated for the algorithm to take up, in its encoding of 1 granule of sound.

This quality-factor is then also used, to produce output. And, in case the actual number of bits output are less than the allocated number of bits, the difference is next added to the number of bits that act as a target, with which the next granule of sound is to be encoded.

Encoding schemes that are truly CBR, are often ones which are not compressed, plus also perhaps ‘DPCM‘… Most of the other schemes, such as ‘MP3′ and ‘OGG’, are really AVBR or VBR.

(Updated 03/11/2018 : )

Continue reading About Constant Bitrate Encoding

3GP Video File Support

I have an Android phone app, which records videos to .3GP File format. The most important fact to remember about video file formats, is that the file-name extension identifies a container file, for which there may or may not be more than one video and audio CODEC supported.

Some users have reported problems, in getting 3GP Video Files to play, or to import, on Linux computers. This is probably the case, because this container format may in some cases store H.263 video and AMR audio. The problem here is in the fact that AMR audio is a proprietary CODEC, which cannot really be shipped for free for licensing reasons.

In my case I am in luck, because my app stores video compressed with the H.264 Video CODEC, and with the AAC Audio CODEC. That means that my video files will play and be imported into projects without effort.

I am sorry if this provides no relief to other users.