Dolby Atmos

My new Samsung Galaxy S9 smart-phone exceeds the audio capabilities of the older S-series phones, and the sound-chip of this one has a feature called “Dolby Atmos”. Its main premise is, that a movie may have had audio encoded according to either Dolby Atmos, or according to the older, analog ‘Pro Logic’ system, and that, using headphone spatialization, it can be played back with more or less correct positioning. Further, the playback of mere music can be made more rich.

(Updated 11/25/2018, 13h30 … )

Rather than just to write that this feature exists and works, I’m going to use whatever abilities I have to analyze the subject, and to try to form an explanation of how it works.

In This earlier posting, I effectively wrote the (false) supposition, that sound compression which works in the frequency domain, fails to preserve the phase position of the signal correctly. I explained why I thought so.

But in This earlier posting, I wrote what the industry had done in practice, which can result in the preservation of phase-positions, of frequency components.

The latter of the above two postings is the more-accurate. What follows from that is, that if the resolution of the compressed stream is high, meaning that the quantization step is small, phase position is likely to be preserved well, while if the resolution (of the sound) is poor, meaning that the quantization step is large, and the resulting integers small, poor phase information will also result, that may be so poor as only to observe the ±180⁰ difference that also follows, from recorded, non-zero coefficients being signed values.

‘Dolby Atmos’ is a multi-track movie sound system, that encodes actual speaker positions, but, not based on the outdated Pro Logic boxes, which were based on analog wires coming in. In order to understand what was done with Pro Logic, maybe the reader should also read This earlier posting of mine, which explains some of the general principles. In addition, while Pro Logic 1 and 2 had as outputs, physical speakers, Dolby Atmos on the S9 aims to use headphone spatialization, to achieve a similar effect.

I should also state from the beginning, that the implementation of Dolby Atmos in the Samsung S9 phone, allows the user to select between three modes when active:

  1. Movies,
  2. Music,
  3. Voice.

In addition to the actual surround decoding, the Samsung S9 changes the equalizer settings – yes, it also has a built-in equalizer.

(Updated 11/30/2018, 7h30 … )

Continue reading Dolby Atmos

Threshold Elimination in Compressed Sound

I’ve written quite a few postings in this blog, about sound compression based on the Discrete Cosine Transform. And mixed in with my thoughts about that – where I was still, basically, trying to figure the subject out – were my statements to the effect that frequency-coefficients that are below a certain threshold of perceptibility could be set to zeroes, thus reducing the total number bits taken up, when Huffman-encoded.

My biggest problem in trying to analyze this is, the fact that I’m considering generalities, when in fact, specific compression methods based on the DCT, may or may not apply threshold-elimination at all. As an alternative, the compression technique could just rely on the quantization, to reduce how many bits per second it’s going to allocate to each sub-band of frequencies. ( :1 ) If the quantization step / scale-factor was high enough – suggesting the lowest quality-level – then many coefficients could still end up set to zeroes, just because they were below the quantization step used, as first computed from the DCT.

My impression is that the procedure which gets used to compute the quantization step remains straightforward:

  • Subdivide the frequencies into an arbitrary set of sub-bands – fewer than 32.
  • For each sub-band, first compute the DCTs to scale.
  • Take the (absolute of the) highest coefficient that results.
  • Divide that by the quality-level ( + 0.5 ) , to arrive at the quantization step to be used for that sub-band.
  • Divide all the actual DCT-coefficients by that quantization step, so that the maximum, (signed) integer value that results, will be equal to the quality-level.
  • How many coefficients end up being encoded to having such a high integer value, remains beyond our control.
  • Encode the quantization step / scale-factor with the sub-band, as part of the header information for each granule of sound.

The sub-band which I speak of has nothing to do with the fact that additionally, in MP3-compression, the signal is first passed through a quadrature filter-bank, resulting in 32 sub-bands that are evenly-spaced in frequencies by nature, and that the DCT is computed of each sub-band. This latter feature is a refinement, which as best I recall, was not present in the earliest forms of MP3-compression, and which does not affect how an MP3-file needs to be decoded.

(Updated 03/10/2018 : )

Continue reading Threshold Elimination in Compressed Sound

Which of my articles might paraphrase frequency-domain-based sound compression best.

I have written numerous postings about sound-compression, in which I did acknowledge that certain forms of it are based on time-domain signal-processing, but where several important sound-compression techniques are based in the frequency-domain. Given numerous postings from me, a reader might ask, ‘Which posting summarizes the blogger’s understanding of the concept best?’

And while many people directly pull up a posting, which I explicitly stated, describes something which will not work, but displays that concept as a point-of-view, to compare working concepts to, instead of recommending that posting again, I would recommend this posting.

Dirk

 

I feel that standards need to be reestablished.

When 16-bit / 44.1kHz Audio was first developed, it implied a very capable system for representing high-fidelity sound. But I think that today, we live in a pseudo-16-bit era. Manufacturers have taken 16-bit components, but designed devices which do bot deliver the full power or quality of what this format once promised.

It might be a bit of an exaggeration, but I would say that out of those indicated 16 bits of precision, the last 4 are not accurate. And one main reason this has happened, is due to compressed sound. Admittedly, signal compression – which is often a euphemism for data reduction – is necessary in some areas of signal processing. But one reason fw data-reduction was applied to sound, had more to do with dialup-modems and their lack of signal-speed, and with the need to be able to download songs onto small amounts of HD space, than it served any other purpose, when the first forms of data-reduction were devised.

Even though compressed streams caused this, I would not say that the solution lies in getting rid of compressed streams. But I think that a necessary part of the solution would be consumer awareness.

If I tell people that I own a sound device, that it uses 2x over-sampling, but that I fear the interpolated samples are simply generated as a linear interpolation of the two adjacent, original samples, and if those people answer “So what? Can anybody hear the difference?” Then this is not an example of consumer awareness. I can hear the difference between very-high-pitch sounds that are approximately correct, and ones which are greatly distorted.

Also, if we were to accept for a moment that out of the indicated 16 bits, only the first 12 are accurate, but there exist sound experts who tell us that by dithering the least-significant bit, we can extend the dynamic range of this sound beyond 96db, then I do not really believe that those experts know any less about digital sound. Those experts have just remained so entirely surrounded by their high-end equipment, that they have not yet noticed the standards slip, in other parts of the world.

Also, I do not believe that the answer to this problem lies in consumers downloading 24-bit, 192kHz sound-files, because my assumption would again be, that only a few of those indicated 24 bits will be accurate. I do not believe Humans hear ultrasound. But I think that with great effort, we may be able to hear 15-18kHz sound from our actual playback devices again – in the not-so-distant future.

Continue reading I feel that standards need to be reestablished.