# Some Thoughts on Surround Sound

The way I seem to understand modern 5.1 Surround Sound, there exists a complete stereo signal, which for the sake of legacy compatibility, is still played directly to the front-left and the front-right speaker. But what also happens, is that a third signal is picked up, which acts as the surround channel, in a way that neither favors the left nor the right asymmetrically.

I.e., if people were to try to record this surround channel as being a sideways-facing microphone component, by its nature its positive signal would either favor the left or the right channel, and this would not count as a correct surround-sound mike. In fact, such an arrangement can best be used to synthesize stereo, out of geometries which do not really favor two separate mikes, one for left and one for right.

But, a single, downward-facing, HQ mike would do as a provider of surround information.

If the task becomes, to carry out a stereo mix-down of a surround signal, this third channel is first phase-shifted 90 degrees, and then added differentially between the left and right channels, so that it will interfere least with stereo sound.

In the case where such a mixed-down, analog stereo signal needs to be decoded into multi-speaker surround again, the main component of “Pro Logic” does a balanced summation of the left and right channels, producing the center channel, but at the same time a subtraction is carried out, which is sent rearward.

The advantage which Pro Logic II has over I, is that this summation first adjusts the relative gain of both input channels, so that the front-center channel has zero correlation with the rearward surround information, which has presumably been recovered from the adjusted stereo as well.

Now, an astute reader will recognize, that if the surround-sound thus recovered, was ‘positive facing left’, its addition to the front-left signal will produce the rear-left signal favorably. But then the thought could come up, ‘How does this also derive a rear-right channel?’ The reason for which this question can arise, is the fact that a subtraction has taken place within the Pro Logic decoder, which is either positive when the left channel is more so, or positive when the right channel is more so.

(Edit 02/15/2017 : The less trivial answer to this question is, A convention might exist, by which the left stereo channel was always encoded as delayed 90 degrees, while the right could always be advanced, so that a subsequent 90 degree phase-shift when decoding the surround signal can bring it back to its original polarity, so that it can be mixed with the rear left and right speaker outputs again. The same could be achieved, if the standard stated, that the right stereo channel was always encoded as phase-delayed.

However, the obvious conclusion of that would be, that if the mixed-down signal was simply listened to as legacy stereo, it would seem strangely asymmetrical, which we can observe does not happen.

I believe that when decoding Pro Logic, the recovered Surround component is inverted when it is applied to one of the two Rear speakers. )

But what the reader may already have noticed, is that if he or she simply encodes his mixed-down stereo into an MP3 File, later attempts to use a Pro Logic decoder are for not, and that some better means must exist to encode surround-sound onto DVDs or otherwise, into compressed streams.

Well, because I have exhausted my search for any way to preserve the phase-accuracy, at least within highly-compressed streams, the only way in which this happens, which makes any sense to me, is if in addition to the ‘joint stereo’, which provides two channels, a 3rd channel was multiplexed into the compressed stream, which as before, has its own set of constraints, for compression and expansion. These constraints can again minimize the added bit-rate needed, let us say because the highest frequencies are not thought to contribute much to human directional hearing…

(Edit 02/15/2017 :

Now, if a computer decodes such a signal, and recognizes that its sound card is only in  stereo, the actual player-application may do a stereo mix-down as described above, in hopes that the user has a pro Logic II -capable speaker amp. But otherwise, if the software recognizes that it has 4.1 or 5.1 channels as output, it can do the reconstruction of the additional speaker-channels in software, better than Pro Logic I did it.

I think that the default behavior of the AC3 codec when decoding, if the output is only specified to consist of 2 channels, is to output legacy stereo only.

The approach that some software might take, is simply to put two stages in sequence: First, AC3 decoding with 6 output channels, Secondly, mixing down the resulting stereo in a standard way, such as with a fixed matrix. This might not be as good for movie-sound, but would be best for music.


1.0   0.0
0.0   1.0
0.5   0.5
0.5   0.5
+0.5  -0.5
-0.5  +0.5



If we expected our software to do the steering, then we might also expect, that software do the 90° phase-shift, in the time-domain, rather than in the frequency-domain. And this option is really not feasible in a real-time context.

The AC3 codec itself would need to be capable of 6-channel output. There is really no blind guarantee, that a 6-channel signal is communicated from the codec to the sound system, through an unknown player application... )

(Edit 02/15/2017 : One note which should be made on this subject, is that the type of matrix which I suggested above might work for Pro Logic decoding of the stereo, but that if it does, it will not be heard correctly on headphones.

The separate subject exists, of ‘Headphone Spacialization’, and I think this has become relevant in modern times.

A matrix approach to Headphone Spacialization would assume that the 4 elements of the output vector, are different from the ones above. For example, each of the crossed-over components might be subject to some fixed time-delay, which is based on the Inter-Aural Delay, after it is output from the matrix, instead of awaiting a phase-shift… )

(Edit 03/06/2017 : After much thought, I have come to the conclusion that there must exist two forms of the Surround channel, which are mutually-exclusive.

There can exist a differential form of the channel, which can be phase-shifted 90⁰ and added differentially to the stereo.

And there can exist a common-mode, non-differential form of it, which either correlates more with the Left stereo or with the Right stereo.

For analog Surround – aka Pro Logic – the differential form of the Surround channel would be used, as it would for compressed files.

But when an all-in-one surround-mike is implemented on a camcorder, this originally provides a common-mode Surround-channel. And then it would be up to the audio system of the camcorder, to provide steering, according to which this channel either correlates more with the front-left or the front-right. As a result of that, a differential surround channel can be derived. )

(Updated 11/20/2017 : )

(Text Deleted on 02/15/2017, Because it was edited too many times. )

Now, I do not know as much about 7.1, where it could be that the legacy stereo is played back to the orthogonally-left and the orthogonally-right speakers, and in which the front-left and front-right could be derived, in some more-complex scheme…

Dirk

(Edit 05/31/2016 : There is another important difference, between Pro Logic I and II, which I did not mention above, and which is relevant to how both versions work.

With Pro Logic II, there is a servo for each derived speaker. With Pro Logic I, there is not, and there may not even be a servo for the front-center speaker.

With Pro Logic I, the surround signal that is sent to the rear, was assumed to be heard in a larger theater, where the delay which individual listeners experience from each speaker varies, and where there could be reflections of sound inside the theater, which again affect the short-term sound arriving at the position of each listener. And so what they do, is to insert a time delay of a few milliseconds into the surround channel sent to the rear, that is long enough to make sure, that each listener cannot make out any short-term correlation in the timing and phase. In this case, it is also not useful, to phase-shift the signal which gets sent to the back either, since all the phase information has been destroyed.

Another reason for inserting the time-delay was, that some amount of front content would leak to the rear speakers, and if the rear-speaker output was delayed, this content would only be heard as coming from the front.

I.e., If a person receives sound from one direction, a few milliseconds before hearing the onset of the same sound from another, he will only seem to hear that sound as coming from the direction, it came from first.

Likewise, if the phase-position of sound is advanced from the front-left, with respect to the front-right, and if it is being played back in legacy stereo, the direction the listener will seem to hear that sound coming from, will jump to the left speaker, even if the amplitudes are the same.

This latter observation needs to be respected when schemes are devised to implement matrix surround, to take care not to cause the amplitude to hint a direction, which can sometimes be opposite the direction, which the phase-position would hint. The amplitude-difference will win out in surround-sound, but when the same signal is listened to in legacy stereo, the results could be opposite what they were in surround. )

In such a case it may only make sense, to invert the version of the decoded Surround channel, which is sent to the opposite rear channel, from the front channel that was taken as positive in the subtraction, as long as the polarity was also correct, with which this single Surround channel was added to the regular stereo signal, when encoding.

It became a feature of Pro Logic II, that an individual could be listening to the sound in a small setting, and benefit from ‘instantaneous sound’. But this process is only ideal, when at any instant in time, there is one main event taking place, which the servos are producing parameters for. I.e., at any one time, one servo can only produce one output value, and servos tend to have low-pass filters, that define over how long an interval of time their output value is correct.

Hence, If there is a conversation taking place front-center, Then the servos will produce ideal mixing parameters for that. OTOH, If there is a gun-shot being fired from the rear-left, And that still takes place slowly enough for the servos to react to it, Then the gun-shot can appear to come from the rear-left speaker with pinpoint accuracy.

What happens if the front-center conversation is fully concurrent, with the gun-shot from the rear-left? The Pro Logic II servos are only producing one set of parameters for mixing the sound sent to each speaker.

There is a phenomenon in the human perception of sound which helps out in this case. If the listener perceives one set of frequency components coming from the front-left speaker, and another set coming from the front-right, and a third set of frequency components coming ‘from the rear’, then human hearing will already try to correlate what is coming from the rear, with the frequency components coming from known, front positions. So human hearing can often infer, that the frequency components of this hypothetical gun-shot sound, seem to correlate more with those from the front-left speaker, than they do with those from the front-right speaker. And so we will seem to hear this hypothetical gun-shot as coming from the rear-left, not the rear-right.

But this would then no longer be due, strictly to the decoding.

And this (type of reinforcement, of the front stereo content) would also be, why Pro Logic I already seemed to work.

(Text Deleted on 02/15/2017 …)

One fact which I should acknowledge:

The existence of compressed surround-sound, does not contradict the fact, that during the mixing of multi-channel sound, a format exists, in which each channel is separately defined. And, a possible misconception which old-timers like me could have, would be that our default channel setup should be, let us say 4-channel sound, so that our AC3 movie-sound encoder can do its magic on that.

(Edit 12/31/2016 : It is my impression that these days, Movie Theaters will forgo Pro Logic II, and just provide 6-channel sound where available. Yet, this poses questions as to how movie-sound is actually compressed. And the audience will sometimes see an ad, for whichever scheme was being used: “THX”, “DTS”, etc.. Privately, we can just encode 6-channel sound with OGG Vorbis whenever we feel like it… )

The fact is, that audio channels are stated in A Standard Sequence Of 6. The first two are front-left and front-right, after which is the center, after which is the LFE, after which we have rear-left and rear-right.

In such programs as “Audacity” this needs to be set up, and then under the ‘Preferences’ dialog, the section that states ‘Export Preferences…’ needs to be switched from its default, which is “Always mix down to Stereo”, to “Custom”, and the box should remain checked off, to show the meta-data with which channels are to be fed to the (compatible) export plugin.

After that, it is up the the AC3 encoder, if available, to do what it does.

This posting of mine is merely speculation, on what exactly the AC3 format may do. It is also speculation on how the all-in-one 5.1 surround microphones may work, that are built-in to high-end camcorders.

The most important fact to know about any compressed format, is that it must be implemented when decoding, completely compatible with how it was implemented encoding. And in my world, the only way to achieve that is to use each plugin exactly as-is.

I should add that the way I have set up Audacity, one of the many effect plugins available, does surround-to-stereo matrix encoding. I can visualize that when this plugin simulates the rear-left channel, it does so by mixing it into the real left stereo, plus into a temporary surround channel which I spent this whole posting to say exists. But I would be overlooking all the optimizations which experts have already put, into the AC3 codec, and into that effect plugin. And then for that reason, such a simplistic assumption can turn into an error.

According to the way I understand Pro Logic decoding, the two stereo channels must be mixed into a center channel, so that this center channel can act as a phase-reference, from which the presumptive surround channel is supposed to be phase-shifted 90 degrees.

In order for that to be possible, the panning to the front speakers, of a simulated rear speaker, cannot be 100%. It may be possible to have 50% or 75% panning taking place, so that when decoding, the servo for the center-front speaker can un-pan the stereo, and produce an additive component, such that the subtractive component is 90 degrees out with that.

But if, with my amateur assumptions, I had made the mistake of crossing over the left-rear speaker 0%, it would not be possible for the Pro Logic servo to do that. And likewise, there are numerous other pitfalls, which a guy like me could fall in to, but which the professionals who designed the codecs do not fall in to.

There exists some divergence, in how some mainstream experts define “Matrix Stereo”. According to one definition, it is the stereo which results, from a separate L+R and L-R representation. But my recent look at Audacity tells me that this term has a broader meaning, which stems from how Matrices work in Linear Algebra.

According to Linear Algebra, it is possible to premultiply a column-vector with a matrix, and to obtain a resulting column-vector.

For the sake of argument, the resulting vector can have 4 elements:


Left
Right
Left, phase-delayed by 90 degrees
Right, phase-delayed by 90 degrees.



The matrix could be a 6×4, so that the original vector could have had 6 elements.

Some linear combination of the 4 output elements, can represent each of my 6 input elements. Each of these combinations will form 1 column, of the matrix, which defines the system of representation.

(Edit 12/31/2016 : I guess that if input elements 5 and 6 are the simulated rear-left and rear-right speakers, phase-delayed, we should have something like this as the bottom-right 2×2 corner of the matrix:


-0.5 +0.5
+0.5 -0.5



This would cause equal input from the rear channels to cancel out. According to what I have been posting, the last two columns of this matrix should contain something like


+0.616 +0.25
+0.25  +0.616
-0.5   +0.5
+0.5   -0.5



So that determining whether the rear-left or the rear-right speaker was doing this, will depend on the polarity of the Surround channel.

In the case of Steve Harris’ LADSPA Plugins, specifically the Matrix Surround Encoder, the input channel sequence is Left, Right, Center, Surround. (Edit 12/31/2016 : And the Surround input channel is oriented +L-R. )

(Edit 12/31/2016 : ) There is a detail about Pro Logic I which I did not yet think to mention. In this archaic form of decoding, only the surround component actually gets sent to the rear speakers, while with Pro Logic II, the effort was introduced in practice, to mix the surround component with one of the stereo channels, and with a separate servo, for each rear speaker.

Also, much of this steering just does not work for music listening, so that the Pro Logic I amplifiers needed to have a separate setting for that, in which the servos are effectively off, and which corresponds to a static matrix being applied. Another way to say the same thing would be, that if the surround component is just fed along with constant gain at each stage, and then either mixed with the stereo again or not so, there would be a matrix that describes the set of parameters being used.

Further, at first glance it might seem that with such codecs as Steve Harris’ LADSPA Plugin, the availability of the Center input channel is redundant, as if Steve Harris might just add that at 0.5 to Left and Right. But this could again be a case where the professional design of the codec is better than first impressions.

Since the Surround channel is later to be recovered in a way as well-isolated as possible, it may make most sense to try to ensure it will also be 90° out-of-phase with Center. While it gives little advantage to meddle with the phase-positions of both Legacy Stereo channels, the Surround channel, after being given the correct polarity and phase-shifted – not in real time, may go through an additional correlation-cancellation loop, that modifies it truly to be out-of-phase with Center.

Therefore, many of the virtual sounds that are supposed to belong to the front, panned perceptual space, should probably offer some contribution to the real Center input to this plugin.

I have really mentioned two ways in which “Audacity” can generate 5.1 surround-sound:

1. Via an Exported Format Codec, which needs to be fed 6 channels, and the focus of which is on compressing the stream into the usual frequency-domain, DCT-based formats that are popular.
2. Via Steve Hariss’ LADSPA Plugin, which does not have an LFE input, and only has 1 Surround input-channel, the focus of which is on generating true stereo output, that carries the multi-channel input.

The availability of an LFE input-channel in case (1) above is meaningful as separate from the Center channel, in that a band of lower frequencies has been deemed outside human perception of direction, and does not need to be included in the sound granules of the other three, complex channels. It only needs to be encoded by itself once, and can therefore also be repeated for a lower rate of sampling intervals.

And case (2) above requires that its 2 output channels effectively not be put through lossy compression, just as analog stereo is not, so that they can carry the surround-sound as phase-information.

I expect that case (1) does not need to ensure that its Surround channel is out-of-phase with the Left, Right and Center channels, because it encodes Surround into a separate stream, which needs to be assigned in MPEG-2 or AC3 Files. Once it decodes said channel again, using an Inverse Discrete Sine Transform will ensure this, If stereo output is asked for.

Because case (2) is receiving its Surround channel in-situ, it has no assurance that it will record to a stereo output stream as anything separate from the stereo input channels. And so I expect that case (2) requires a correlation-cancellation loop, to keep these streams from interfering. Taking one input as-is, and simply phase-shifting the other 90°, will not ensure this.

Whenever I have written that a digital system is to compute correlation, I am referring to the simple exercise, in which an integration is performed over time, but in which the previous accumulated sample is multiplied by a factor (k) that will cause it to decay, as if a high-pass filter had also been applied, where

 h = sampling rate ω = 2πF k = 212 h / (h + ω) c0 = r0 s0 2-15 ci = ri si 2-15 + k ci-1 2-12 cri = ci ω / h 
(ω) must be chosen, to represent the time-constant (speed) with which this integral is supposed to react, and the number of bits of precision of (c), 32, needs to exceed those of (r) and (s), 16, accordingly, to allow the multiplications and safe summation. (k) is assumed to be an unsigned binary fraction, maybe to 12 bits of precision. (cr) is supposed to have the correct feedback-gain.

Note: The harder way to implement a broadband 90° phase-delay with digital pipelines, is:

• Subdivide the signal into overlapping sampling windows, with adequate windowing-functions,
• Compute the DCT of each sampling window (the Discrete Cosine Transform), in order to put it from time-domain into frequency-domain.
• Invert the frequency-domain data back into the time-domain, using the IDST (the Inverse Discrete Sine Transform).
• Recompose the stream, using adequate windowing-functions.

I could understand that maybe some programmers might feel that sound-processing is in-situ, and that therefore, natural phase-correlations are rare. Therefore, the simple act of passing the Surround channel through a correlation-cancellation loop might be seen as enough of a measure, to put it out-of-phase with the Center channel.

This would simplify computation tremendously, but opens up the risk, that in a deterministic environment, using a DAW (a Digital Audio Workstation), several streams might have been decompressed exactly the same way, and are in fact naturally in-phase, because they are in-vitro. If that risk could be foregone, then the analysis of the LADSPA Plugin might be complete.

If this problem was to happen to you, while using a DAW, then one pragmatic solution would be, perform an explicit user-operation to phase-shift the Surround channel 45°, before running the Plugin.

(Edit 02/15/2017 : )

I have written here about Pro Logic and the AC3 codec, both of which may actually be considered obsolete, depending on how trendy the reader is. Specifically, AC3 was used with MPEG-2 video streams, and on DVDs. For use with .MP4 Video Files and Blu-Ray, the required codec is called AAC – the ‘Advanced Audio Codec’. Further, this codec can be used just for sound by itself, which most-properly leads to files that end with ‘.M4A’ .

Linux is particularly creative at being able to recognize a file-type, even if the extension in its name is not 100% correct. So I have been able to name sound files that use ‘AAC’, .AAC Files, and still play them.

Because AAC is one step ahead of AC3, it would not surprise me, that AAC may have a separate Surround channel for each of the rear speakers.

(Edit 11/20/2017 : )

There is a certain gist in what I wrote above, which I was actually describing in a roundabout way, but which I was failing to highlight clearly.

While certain analog-age thinkers might deem sound-processing ideal, which phase-shifts a surround-component 90⁰ and then adds it to one channel while subtracting it to another, that approach is likely to have two drawbacks:

1. Sound recorded in-situ does not start out as being in-phase, so that to perform a 90⁰ phase-shift on it, instead of putting it out out-of-phase, could accidentally bring it in-phase, just as easily,
2. While analog technology was efficient at phase-delaying a signal by some arbitrary amount, without emphasizing lower or higher frequency-components, there is an absence of equivalent digital algorithms, unless we really do want to perform DCTs on the signal, which does not promise real-time results.

This posting expresses my suspicion, that what consumer-quality, digital technology does instead, is to decorrelate the surround-channel with respect to the center-channel, resulting in a derived surround-channel, such that if it was multiplied continuously with the center channel, doing so would yield a signal with an integral that stays near zero.

This decorrelated channel would then be used, as if it contained components that are both 90⁰ phase-advanced as well as 90⁰ phase-delayed, with respect to the center-channel.

When encoding to a compressed format, this derived channel could then be used as-is.

But when decoding, this surround-channel needs to be applied differentially, which also means that when it was encoded, its polarity must have been unambiguous.

If the surround-channel started out as the result of a subtraction between rear-left and rear-right mikes, then this latter detail takes care of itself. But if it was derived from a single, common-mode mike, then this can be accomplished by computing the correlation of the surround-channel with the (L-R) stereo component, and whenever the correlation goes negative, the polarity of the surround-channel can be inverted. And what this would accomplish, is to present the surround channel consistently as though it has (+L-R) polarity as it was encoded.