My new Samsung Galaxy S9 smart-phone exceeds the audio capabilities of the older S-series phones, and the sound-chip of this one has a feature called “Dolby Atmos”. Its main premise is, that a movie may have had audio encoded according to either Dolby Atmos,
or according to the older, analog ‘Pro Logic’ system, and that, using headphone spatialization, it can be played back with more or less correct positioning. Further, the playback of mere music can be made more rich.
(Updated 11/25/2018, 13h30 … )
Rather than just to write that this feature exists and works, I’m going to use whatever abilities I have to analyze the subject, and to try to form an explanation of how it works.
In This earlier posting, I effectively wrote the (false) supposition, that sound compression which works in the frequency domain,
fails to preserve the phase position of the signal correctly. I explained why I thought so.
But in This earlier posting, I wrote what the industry had done in practice, which can result in the preservation of phase-positions, of frequency components.
The latter of the above two postings is the more-accurate. What follows from that is, that if the resolution of the compressed stream is high, meaning that the quantization step is small, phase position is likely to be preserved well, while if the resolution (of the sound) is poor, meaning that the quantization step is large, and the resulting integers small, poor phase information will also result, that may be so poor as only to observe the ±180⁰ difference that also follows, from recorded, non-zero coefficients being signed values.
‘Dolby Atmos’ is a multi-track movie sound system, that encodes actual speaker positions, but, not based on the outdated Pro Logic boxes, which were based on analog wires coming in. In order to understand what was done with Pro Logic, maybe the reader should also read This earlier posting of mine, which explains some of the general principles. In addition, while Pro Logic 1 and 2 had as outputs, physical speakers, Dolby Atmos on the S9 aims to use headphone spatialization, to achieve a similar effect.
I should also state from the beginning, that the implementation of Dolby Atmos in the Samsung S9 phone, allows the user to select between three modes when active:
In addition to the actual surround decoding, the Samsung S9 changes the equalizer settings – yes, it also has a built-in equalizer.
(Updated 11/30/2018, 7h30 … )
(As of 11/25/2018, 13h30 : )
What I can see, is that the sound settings can be applied, regardless of what format the sound was stored in, and I can hear, that the settings affect all sound playback, even if the sound file did not specifically contain any positioning information.
But what the sound settings do, is not the same, regardless of what type of sound file is being played. I have just experimented with an .MKV File, encoded for Dolby Atmos specifically. And in order to play those on the S9, the user must use the bundled video player. As a result, the quality of the surround effect is superb, greatly exceeding what the old Dolby Pro Plogic technology was able to do, even though Pro Logic possessed actual, physical speakers.
The effect of the sound-processing on Music, played from traditional, compressed sound files, should carry out the same sort of pseudo-quadrophonic decoding, that the analog days of the 1970s already knew. In other words, if the music file had no special surround encoding, it would result in sound that seems to emanate from all around the listener, instead of always from the middle of his own head. But logically, it can then not be said, that the positioning that the listener hears, is accurate. It’s being guessed by Dolby Atmos, but better than most usual, stereo headphone-sound.
Apparently, 3rd-party video playback apps, with random CODECs, seem to forgo spatialization, and the video app that ships with the phone, does not even bother with many older CODECs. 3rd party playback apps, seem to respond to an internal command, to play back their audio streams in stereo.
I suppose that an underlying question to ask would be,
‘Once the media-file has been decoded into virtual speaker-positions, how can sound be made to play back as though from that set of positions, but as heard from the headphones?’
And the simplest roundabout answer I can give is that, after the actual decoding, 5 intermediate channels are fed to a matrix, that is, a linear transform which can accept a column vector as input and which, after the matrix-multiplication of the time-domain sound samples, produces 4 outputs:
- The left channel,
- The right channel,
- A delay-line leading to the left channel,
- A delay-line leading to the right channel.
The matrix can contain positive, negative and zero coefficients so that when mixed into the 4 outputs, a set of 5.1 (simplified) virtual speaker-positions results.
But, Dolby Atmos does not just aim to synthesize 5.1 speakers, with 4 of those speakers at 45⁰ to the listener. Dolby Atmos actually aims to synthesize 7.1.2 speakers, meaning that the headphone spatialization specifically, also needs to synthesize the speakers orthogonally to the left and right, as well as two more speakers above. And the only way that can really be done, is to add more time-delays.
For 45⁰-positioned speakers, a single time-delay of 0.35 milliseconds might seem appropriate. But in order to synthesize the orthogonal speakers, a time-delay of 0.5 milliseconds is best, while to synthesize sounds from above, additional time-delays of approximately 8..12 milliseconds, corresponding to a simulated ground wave, would be appropriate.
My hearing is not good enough to tell me, whether the Samsung S9 actually uses 6 delay-lines, which is feasible. My hearing only tells me that the effect is satisfyingly complete. Dolby Atmos exceeds the limits of what I can hear. In theory, the S9 could be simulating the speakers orthogonally to the left and right of the listener, by just feeding them directly to the left and right headphone-outputs, but without using any special time-delays. If they did that, it does not detract from the way sound is played back overall, and it would mean that ‘only’ 4 delay-lines were used.
One of the emerging facts about the S9 though is, that it does not implement the system of servos and steering, which Dolby Pro Logic used, meaning that I do not hear any evidence of such servos being implemented in its sound chip. And that may also be, why the correct decoding of AC3 Sound has been skipped.
My original posting on this subject, on 11/21/2018, seriously underestimated what this sound chip can do, because at that time, I had only been listening to my music on-the-go with it. Now that I’ve actually done the exercise of trying a few video files, I have a clearer view of what the sound system of the Samsung Galaxy S9 can do.
There is another interesting implementation detail, about how the S9 does Dolby Atmos. I wrote above that it essentially has 3 modes, for playing back Stereo. Well it also has a setting, to switch between these playback modes automatically.
What the feature will not do to the listener, is to switch back and forth between playback modes, in the middle of a single audio file. Some earlier (analog) appliances used to do that, and it can be annoying when it happens. But then a situation which I’ve experienced as a result is, that one of my .MP3 Files had a lecture that was apparently being hosted as if on a radio station, where most of the track simply contained Voice. But, during the first few seconds of that audio file, there was music being played in Stereo, as if on a radio station. This resulted, in the Voice which came later, apparently being played back with some ill-suited reverb.
So what was happening here was, that the sound-processing initially detected Music, and that it then kept this setting, for the duration of the lecture, even though the rest of the audio file was Voice.
This situation doesn’t bother me much. But if it did bother other listeners, then they would have the ability to pull down the Dolby Atmos settings from the Notification Bar of the S9, and to change to a manual, Voice setting. It would just be important, to change that setting back, either to Automatic, or to a specific other manual setting, before listening to any movie soundtracks or music later on.
(Update 11/29/2018, 18h40 : )
I should also note, that my description of Headphone Spatialization above, would be a gross oversimplification. In general, the way our outer ears receive waves, that originated from the natural space around a listener, would be referred to as the ‘Head Transfer Function’. This function would be defined in such a way, that a different amplitude and phase-position result at each frequency, and the exact definition of this function is actually slightly different, from one person to another. Its proper definition may require the use of complex numbers, and will then also act as a superset, of time-delay functions. ( :1 )
When people refer to Headphone Spatialization. I take this to mean, ‘A simplification of the Head Transfer Function, which not only most listeners have in common, but which has been made simple enough actually to be implemented.’
But even, Headphone Spatialization, based on just two delay-lines, may already be too simplified to work well. For example, while it might be tempting just to assume that a coefficient of (-0.5) from the signal that crossed over in front of the listener, while a coefficient of (+0.5), for the signal that crossed over behind the listener, could be applied, doing so would produce an effect that’s beautiful Mathematically, but that just does not correspond to what the Human Head’s aerodynamics would yield.
But an approach slightly more sophisticated than what I wrote above, yet still simpler than Head Transfer Function, can be implemented, if our components became, a set of filters, a matrix, and two delay lines, to implement the most-essential 4 speakers at 45⁰.
Simply taking the front-left (virtual) speaker as an example, it could lead to 2 or more inputs to the matrix. One such input could contain the signal of this virtual speaker, filtered down to a frequency band, for which one coefficient in the matrix was correct, but input from the same virtual speaker could also be filtered down to a complementary band of frequencies, that would combine with the first band of frequencies to yield full playback, but for which a different coefficient in such a matrix might be more-correct. And then, the simulation of 4 virtual speaker-positions, might actually yield
8 inputs to the matrix, for which there would be 32 coefficients as constants, but again yielding only 4 outputs: The two output channels, and the two output channels, before a delay-line has been applied.
And so, the fact that matrices can become large in the practical world, yet easy to implement as sequential computations on microchips that possess firmware, can end up being very useful.
Further, the Samsung galaxy S9 smart-phone, already possesses a muti-band equalizer, which can be applied to all sound played back, and which therefore, also outputs separated bands of frequencies, for each of the input channels, the bands which would normally just get added together, after each being amplified or attenuated. Well this weighted addition of filer-outputs, can in fact be generalized as an example of the same sort of matrix, except for the fact that then, the coefficients that yield time-delayed outputs, would all be set to zero.
One of the uses which I’ve put the sound system of the Galaxy S9 phone to, is just to have music playing on the headphones, as I’m walking around, and this music was recorded in plain stereo, with no additional positional information. That is, my music was not encoded to differentiate between the front-left, and the rear-left directions of sound…
And I think that after more than a week of having the Dolby Atmos setting turned on, I have noticed a slight drawback. I think that Samsung did not put just enough emphasis, to make the stereo sound as though played consistently in front of the listener. I.e., when playing those old sound files I’m noticing that, more often than not, the positioning which the Dolby Atmos feature is guessing, is placing the sound somewhere between rear-left and rear-right directions, while also making it sound as though from all-around the listener.
Well, one can’t ask for perfection. But I am inferring that, when an attempt is made to simulate the rear speakers, this can be achieved through less spectral differentiation, of the delayed, crossed-over sound waves…
(Edit 11/30/2018, 7h30 :
This observation also strikes me as convenient, because it would seem to suggest that, when the playback system switches in favor of Dolby Atmos -encoded Movies, it may only need to separate the signals from the simulated front speakers into frequency-bands. There’s every possibility, that to simulate the rear, or the ‘above’ speaker positions, no spectral differentiation may be required, so that these channels each represent only one input to the matrix.
The reason this is convenient is the fact, that to implement 4 or even 6 filter-banks, would be more expensive computationally, than it would be to implement the standard set, of 2 filter banks, and implementations that are less expensive, but that produce all the required effects, are also more-probable to be used on the sound chip of the phone.
(… end of Edit, 11/30/2018, 7h30 )
If an attempt is made to simulate a time-delay, using a Fourier Transform, then there are some limitations to the approach, which may also make the task difficult, to create Headphone Spatialization, using outputs from a matrix, such that 1/2 the outputs correspond to the other 1/2 of the outputs, only
Specifically, the longer the simulated time-delay is to become, to closer-together phase-rotations in the frequency-domain need to become, in units of frequency. And, in order for any type of Fourier Transform to be computed, to have higher frequency-resolution, the longer the sampling-window needs to become. In the end, the sampling-window will need to be at least as long, as the time-delay one wanted to simulate.
And the result would only be, the rotation of the time-domain waves, within each sampling window, ‘in a way that wraps around’.
But, in order to create Headphone Spatializion, the practical implementation may need to be in the time-domain, use fewer temporary variables, and work with only 1 input per octave of frequencies. For that reason, to have outputs from a matrix that are to be time-delayed, may already result in per-frequency phase-shifts that will be correct, plus Inter-Aural Time perception, that is correct.