A friend of mine once suggested that ‘a good way’ to compress a (2D) video stream would be, to compute the per-pixel difference of each frame with respect to the previous frame, and then to JPEG-Compress the result. And wouldn’t we know it, this is exactly how MJPEG works! However, the up-to-date, state-of-the-art compression schemes go further than to do that, in order to achieve smaller file-sizes, and are often based on Macroblocks.
Also, my friend failed to notice that at some point within 2D video compression, ‘reference frames’ are needed, which are also referred to sometimes as key-frames. These key-frames should not be confused however with the key-frames that are used in video editing software, 2D and 3D, to control animations which the power-user wants to create. Reference frames are needed within 2D video compression, if for no other reason, than the fact that given small amounts of error with which ‘comparison frames’ are decompressed, the actual present frame’s contents will deviate further and further from the intended, original content, beyond what is acceptable given that the stream is to be compressed.
The concept behind Macroblocks can be stated quite easily. Any frame of a video stream can be subdivided into so-called “Transform Blocks”, which are typically 8×8 pixel-groups, and of which the Discrete Cosine Transform can computed, in what would amount into the simple compression of each frame. The DCT coefficients are then quantized, as is familiar. Simply because the video is also encoded as having a Y’UV colour scheme, there are two possible resolutions at which the DCT could be computed, one for the Luminance Values, and the lower resolution, spanning the doubled number of pixels, for the Chroma Values. However, it is in the comparison of each frame with the previous frames, that ‘good’ 2D video compression has an added aspect of complexity, which my friend did not foresee.
The preceding frame is first translated in 2D, by a vector that is encoded with each Macroblock, in an estimation of motion on the screen, and only after this translation of the subdivided image by an integer number of pixels by X and by Y a sub-result forms, with which the per-pixel difference of the present frame is computed, resulting in per-pixel values that may or may not be non-zero, and resulting in the possibility that an entire Transform Block has DCT coefficients which may all be zeroes.