## Is it valid that audio equipment from the 1970s sound better than modern equipment?

That depends on which piece of audio equipment from the 1970s, is being compared with which piece of equipment from today.

If the equipment consists of a top-quality turntable from the late 1970s, compared to the most basic MP3-player from today, and if we assume for the moment that the type of sound file which is being played on the Portable Audio Player, is in fact an MP3 File recorded at a bit-rate of 128kbps, then the answer would be Yes. Top-quality turntables from the late 1970s were able to outperform that.

OTOH, If the audio equipment from today is a Digital Audio Player, that boasts 24-bit sound, that only happens to be able to play MP3 Files, but that is in fact playing a FLAC File, then it becomes very difficult for even the better audio equipment from the 1970s to match that.

Top-Quality Audio Equipment from the late 1970s, would have cost over $1000 for one component, without taking into account, how many dollars that would have been equivalent to today. The type of Digital Audio Player I described cost me C$ 140.- plus shipping, plus handling, in 2018.

Also, there is a major distinction, between any sort of equipment which is only meant to reproduce an Electronic signal, and equipment which is Electromechanical in nature, including speakers, headphones, phonographs… ‘The old Electromechanical technology’ was very good, except for the basic limitation, that they could not design good bass-reflex speakers, which require computers to design well. With no bass-reflex speakers, the older generations tended to listen to stereo on bigger, expensive speakers. But their sound was good, with even bass.

## The Recent “OGG Opus” Codec

One of the uses which I’ve had for OGG Files has been, as a container-file for music, which has been compressed using the lossy “Vorbis” Codec. This has given me superior sound to what MP3 Files once delivered, assuming that I’ve set my Vorbis-encoded streams to a higher bit-rate than what most people set, that being 256kbps, or, Quality Level 8.

But the same people who invented the Vorbis Codec, have embarked on a more recent project, which is called “OGG Opus”, which is a Codec that can switch back and forth seamlessly, between a lossy, Linear Predictive Coding mode (“SILK”), and a mode based on the Type 4 Discrete Cosine Transform (‘DCT’), the latter of which will dominate, when the Codec is used for high-fidelity music. This music-mode is defined by “The CELT Codec”, which has a detailed write-up dating in the year 2010 from its developers, that This Link points to.

I have read the write-up and offer an interpretation of it, which does not require as much technical comprehension, as the technical write-up itself requires, to be understood.

Essentially, the developers have made a radical departure from the approaches used previously, when compressing audio in the frequency domain. Only the least of the changes is, that shorter sampling windows are to be used, such as the 512-sample window which has been sketched, as well as a possible 256-sample window, which was mentioned as well. In return, both the even and odd coefficients of these sampling windows – aka Frames – are used, so that only very little overlap will exist between them. Hence, even though there will still be some overlap, these are mainly just Type 4 Discrete Cosine Transforms.

The concept has been abandoned, that the Codec should reconstruct the spectral definition of the original sound as much as possible, minus the fact that it has to be simplified, in order to be represented with far fewer bits, than the original sound was defined as having. A 44.1kHz, 16-bit, stereo, uncompressed Wave-File consumes about 1.4Mbps, while compressed sampling rates as low as 64kbps are achievable, and music will still sound decently like music. The emphasis here seems to be, that only the subjective perception of the sound is supposed to remain accurate.

(Updated 8/03/2019,16h00 … )

## Some Trivia about Granules of Sound

One of the subjects which I’ve blogged about often, is the compression of sound, including Codecs which are based in the frequency-domain, rather than in the time-domain. What I’ve basically written is that in such cases, the time-domain samples of sound generate granules of frequency-domain coefficients, which are then in turn quantized. What tends to happen is that a new granule of sound is encoded every 576 time-domain samples, but that each time, a 1152-sample sampling window is used, and that due to the application of the “Modified Discrete Cosine Transform” (the ‘MDCT’), what amounts to all the odd coefficients of the Type 2 ‘DCT‘ are encoded, resulting in 576 coefficients being encoded each time. The present sampling window’s cosine function corresponds to the previous and next sampling window’s sine function, so that in a way that is orthogonal, these overlapping sampling windows also have the potential to preserve phase-information.

One observation which my readers may have about this, is the fact that while it does a good job at maintaining spectral resolution, this granule-size does not provide good temporal resolution. Therefore, a mechanism which MP3 compression introduced already, was ‘transient detection’. This feature can arbitrarily replace one of these full-length granules with 3 granules that only generate 192 frequency coefficients, and that recur as frequently.

The method by which transients are detected may be simple. For example, these short granules may tentatively have the stream subdivided all the time, but if any one of them contains more than average variance – which corresponds to signal energy – for example, if one shorter granule contains 1.5 times the average signal energy between the current 3, then this switch can take place.

What I do know is that when granules of sound – or rather, the quantized spectral information from granules of sound – are included in the stream, they include two extra bits each time, that define what the “Zone” of the present granule is. This can be one of four zones:

• A full-sized granule belonging to a stream of them,
• A shortened granule, belonging to a stream of them,
• A shortened granule, that precedes a full-sized granule,
• A shortened granule, that follows a full-sized granule.
• Because it’s inherent in MP3 compression that the entire current sampling window must overlap, partially with the preceding, and partially with the following one, there may be no special rule for how to shape a sampling window, that corresponds to a long granule, both preceded and followed by shortened ones. However, when that happens, both the preceding and following shortened granules will be encoded, to be followed and preceded respectively, by a long granule, for which reason those granules will already have long overlap-portions. Therefore, the current granule in such a case can be encoded as though it was just part of a sequence of long granules.

This information is ultimately non-trivial because it also affects the computation of sampling windows, i.e., it also affects the exact windowing function to be used when encoding. If the granule is followed or preceded by short granules, then either side of the windowing function must also be shortened. (:1)

Now, in the case of other Codecs, such as ‘OGG Vorbis’, a similar approach is taken. But I can well imagine that if specific ideals were simply implemented exactly as they were with MP3 sound, then eventually, the owners of the MP3 Codec might cry foul, over software patent violations. And yet, this problem can easily be sidestepped, let’s say by deciding that the shortened granules be made 1/2 the length of the full-sized granule, instead of 1/3 that length. And at that point the implementation would be sufficiently different from the original idea, that it would no longer constitute a patent violation.

## Deriving a workable entropy-encoding scheme, based on the official explanation of CABAC.

One of the subjects which I recently blogged about, is that when encoding video-streams, some Codecs use 8×8 sample Discrete Cosine Transforms, but as with many DCTs, the coefficients produced tend to be values, which would take up too much space to store, in a fixed-length format. And so a family of techniques which gets applied, is loosely referred to as ‘Entropy Encoding’, with the key idea being, that the Entropy Encoding used for compressed video, is different again, from the Entropy Encoding used for compressed audio. And the scheme used for video has as advantage, that the encoding itself is lossless. Apparently, there are two variants actually used with H.264-encoded videos, which some people group together as MPEG-4:

1. An unspecified form of variable-length encoding,
2. CABAC,

The latter of which promises better compression, at the cost of greater CPU-power required, both to encode and to decode. I’m going to focus on ‘CABAC’ in this posting. There is an official explanation for how CABAC works, which I will refer to. In order to understand my posting here, the reader will need to have read the documentation I just linked to.

From first impressions – yesterday evening was the first day on which I examined CABAC – I’d say that the official explanation contains an error. And I’ll explain why, by offering a version of Entropy-Encoding, which I know can work, based on the link above, but different from it:

• Integers are meant to be encoded, that are “Binarized”.
• The probability with which the first “Bin” has become (1) instead of (0) can be analyzed as described, resulting in a Context Model of one out of (0, 1, 2), as described.
• The next four Bins may not have individual probabilities computed, only resulting in Context Models (3, 4, 5, 6) when they are (1) instead of (0), which override the Context Model that the first Bin would generate.
• The resulting, one Context Model could be averaged over the previous Values.
• Using As a Pair of values, the Context Model (from the previous values) which was just computed, And the (present) Integer Value, a look-up can take place in a 2-dimensional table, of which sequence of bits to use, to encode (both).
• Because the decoder has chosen the integer value out of a known row in the same look-up table, it can also update the Context Model being used, so that future look-ups when decoding remain unambiguous.

The main problem I see with the official explanation is, that because up to 6 Context Models can be computed, each of which supposedly has its own probability, based on that, the lookup-table in which binary values (entropy encodings) are to be found, would effectively need to be a 6-dimensional table ! Officially, all the Context-Models found, have equal meaning. Software is much-more probable, which uses a 2D table, than software which uses a 6-dimensional table, although according to Theoretical Math, 6-dimensional tables are also possible.

But then, a property of Variable Length Coding which has been observed for some time, was that small integers, such as (0), (1) and (2), were assigned very short bit-sequences to be recognized, while larger integers, such as (16) or (17), were assigned recognizable bit-sequences, which would sometimes have been impractically long, and which resulted in poor compression, when the probability of the integer actually being (0), (1) or (2) decreased.

So, because we know that we can have at least one Context-Model, based on the actual, local probabilities, when the probabilities of very small integers become smaller, a series of entropy-encodings can be selected in the table, the bit-length of which can be made more-uniform, resulting in smaller encodings overall, than what straight Variable-Length Encoding would have generated, CABAC instead being adapted to probable, larger integers.

The fact will remain, that the smaller integers will require fewer bits to encode, in general, than the larger integers. But when the smallest integers become very improbable, the bit-lengths for all the integers can be evened out. This will still result in longer streams overall, as larger integers become more-probable, but in shorter streams than the streams that would result, if the encodings for the smallest integers remained the shortest they could be.