Some realizations about Digital Signal Processing

One of the realizations which I’ve just come across recently, about digital signal processing, is that apparently, when up-sampling a digital stream twofold, just for the purpose of playing it back, simply to perform a linear interpolation, to turn a 44.1kHz stream into an 88.2kHz, or a 48kHz stream into a 96kHz, does less damage to the sound quality, than I had previously thought. And one reason I think this is the factual realization that to do so, really achieves the same thing that applying a (low-pass) Haar Wavelet would achieve, after each original sample had been doubled. After all, I had already said, that Humans would have a hard time being able to hear that this has been done.

But then, given such an assumption, I think I’ve also come into more realizations, of where I was having trouble understanding what exactly Digital Signal Processors do. It might be Mathematically true to say, that a convolution can be applied to a stream after it has been up-sampled, but, depending on how many elements the convolution is supposed to have, whether or not a single DSP chip is supposed to decode both stereo channels or only one, and whether that DSP chip is also supposed to perform other steps associated with playing back the audio, such as, to decode whatever compression Bluetooth 4 or Bluetooth 5 have put on the stream, it may turn out that realistic Digital Signal Processing chips just don’t have enough MIPS – Millions of Instructions Per Second – to do all that.

Now, I do know that DSP chips exist that have more MIPS, but then those chips may also measure 2cm x 2cm, and may require much of the circuit-board they are to be soldered in to. Those types of chips are unlikely to be built-in to a mid-price-range set of (Stereo) Bluetooth Headphones, that have an equalization function.

But what I can then speculate further is that some combination of alterations of these ideas should work.

For example, the convolution that is to be computed could be computed on the stream before it has been up-sampled, and it could then be up-sampled ‘cheaply’, using the linear interpolation. The way I had it before, the half-used virtual equalizer bands would also accomplish a kind of brick-wall filter, whereas, to perform the virtual equalizer function on the stream before up-sampling would make use of almost all the bands, and doing it that way would halve the amount of MIPS that a DSP chip needs to possess. Doing it that way would also halve the frequency linearly separating the bands, which would have created issues at the low end of the audible spectrum.

Alternatively, implementing a digital 9- or 10-band equalizer, with the
bands spaced an octave apart, could be achieved after up-sampling, instead of before up-sampling, but again, much more cheaply in terms of computational power required.

Dirk

Dolby Atmos

My new Samsung Galaxy S9 smart-phone exceeds the audio capabilities of the older S-series phones, and the sound-chip of this one has a feature called “Dolby Atmos”. Its main premise is, that a movie may have had audio encoded according to either Dolby Atmos, or according to the older, analog ‘Pro Logic’ system, and that, using headphone spatialization, it can be played back with more or less correct positioning. Further, the playback of mere music can be made more rich.

(Updated 11/25/2018, 13h30 … )

Rather than just to write that this feature exists and works, I’m going to use whatever abilities I have to analyze the subject, and to try to form an explanation of how it works.

In This earlier posting, I effectively wrote the (false) supposition, that sound compression which works in the frequency domain, fails to preserve the phase position of the signal correctly. I explained why I thought so.

But in This earlier posting, I wrote what the industry had done in practice, which can result in the preservation of phase-positions, of frequency components.

The latter of the above two postings is the more-accurate. What follows from that is, that if the resolution of the compressed stream is high, meaning that the quantization step is small, phase position is likely to be preserved well, while if the resolution (of the sound) is poor, meaning that the quantization step is large, and the resulting integers small, poor phase information will also result, that may be so poor as only to observe the ±180⁰ difference that also follows, from recorded, non-zero coefficients being signed values.

‘Dolby Atmos’ is a multi-track movie sound system, that encodes actual speaker positions, but, not based on the outdated Pro Logic boxes, which were based on analog wires coming in. In order to understand what was done with Pro Logic, maybe the reader should also read This earlier posting of mine, which explains some of the general principles. In addition, while Pro Logic 1 and 2 had as outputs, physical speakers, Dolby Atmos on the S9 aims to use headphone spatialization, to achieve a similar effect.

I should also state from the beginning, that the implementation of Dolby Atmos in the Samsung S9 phone, allows the user to select between three modes when active:

  1. Movies,
  2. Music,
  3. Voice.

In addition to the actual surround decoding, the Samsung S9 changes the equalizer settings – yes, it also has a built-in equalizer.

(Updated 11/30/2018, 7h30 … )

Continue reading Dolby Atmos