There exists an argument, against Headphone Spatialization.

Headphone Spatializaion is also known as ‘Binaural Sound’ or ‘Binaural Audio’. It is based on the idea, that when people hear direction, people do not only take into account the relative amplitudes of Left versus Right – aka panning – but that somehow, people also take into account time-delay that sound requires, to get to the more-distant ear, with respect to the closer ear. This time-delay is also known as the Inter-Aural Delay.

Quite frankly, if people are indeed able to do this, I would like to know how, because the actual cochlea cannot do this. The cochlea only perceives frequency-energies, and the brain is able in each hemisphere, to do a close comparison of those energies, between sound perceived by both ears.

If such an ability exists, it may be due to what happens in the middle ear. And this could be, because sound from one source, reaches each cochlea, over more than one pathway…

But what this also means is that if listeners simply put on headphones and listen to stereo, they too are relatively unable to make out positioning, unless that stereo is very accurate, which it is not, after it has been MP3-compressed.

So technology exists hypothetically, that will take explicit surround-sound, and encode it into stereo which is not meant to be re-compressed afterward, but that allows for stereo-perception.

There exist valid arguments against the widespread use of such technology. The way each person interprets the sound from his two ears, is an individual skill which he learns, in many cases people can only hear direction by moving their heads slightly, and the head of each person is slightly different anatomically. So there might not be any unified way to accomplish this.

What I find, is that when there are subtle differences in how this works over a large population, there is frequently a possible simplification, that does not correspond 100% to how one case interprets sound, but that works better in general, than what would happen if the effect is not applied at all.

Therefore, I would think of this as a “Surround Effect”, rather than as ‘Surround Sound’, the latter of which is meant to be heard over speakers, and where the ability of an individual to make out direction, falls back on the ability of the individual, also to make out the direction of his own speakers.



Print Friendly, PDF & Email

5 thoughts on “There exists an argument, against Headphone Spatialization.”

  1. Yes, we can detect inter-aural delay and that has been well known for a very long time. It is the primary cue used to detect direction at low frequencies.

    You can try it yourself using Matlab or any other programming language. Just take a signal, make a copy and delay that copy a bit. Play the signals through headphones (the undelayed to one ear and the delayed to the other) and you can easily hear the sound stage go to the side that is not delayed.

    Clearly, the ear is sensitive to more than just frequency and energy. If this were not the case, then time domain masking would not occur and a song played backwards would sound identical to one played forwards as a signal and a signal reversed in time have exactly the same magnitude fourier transforms.

    1. All fine and good. I’ve already observed that on a Windows 7 system (which I’m no longer running), I had a set of Sennheiser Gaming Headphones, which emulated 5.1, or 7.1 surround-sound as ‘virtual’ speakers, but in such a way that I could hear the results played out, over the 2 ‘real’ channels worn on my head.

      (Edit :

      In the posting you commented to, my real question was rather, ‘Is it worth the while, devising special compression-schemes, that increase the amount of headphone-directionality, but that also cut the number of bits per second, to the same degree that other Codecs already did?’ And I think that a tentative answer -today to my own question -then would be, ‘M4A / MP4 Sound already accomplishes that. Further, high-quality devices that play this back, are more commonplace today, than they were at the time of my posting. )

      The big question this leaves unanswered in my own mind is, how this information gets communicated, after it reaches our ears. We do know, that the cochlea itself has as output, one auditory nerve, which seems to be organized into frequencies.

      Does this time-awareness include only the onset of a pulse, or even a fixed phase-position of one continuous sine-wave? And if this awareness includes the latter, would this be because some form of sound is transmitted between the ears, as if by the middle ear? We are known to have Eustachian Tubes. Is it communicated through the actual bone of the skull? Or does the auditory nerve encode information which Neurologists have failed to explain?


    2. You see, I’ve tried this in two ways. In the case of temporally complex streams, it works. As a stream, I had the sound of a horse’s hooves clopping, as though a horse was trotting, and a Game Engine played it through my binaural headset, so that the sound would travel in circles around the player. This was accompanied by a visual cue, of a model of some sort traveling in 3D around the player’s perspective.

      But I did another experiment in the past, which consisted of several audio-tracks, each of which contained an uncompressed stereo-signal of a pure sine-wave of the same (mid-range) frequency, playing for about 20 seconds, followed by several seconds of silence. And the tracks would alternate, between being in-phase, and being 180⁰ out-of-phase.

      Additionally, each track would contain an interruption in the signal at an unpredictable point in time, on one channel but not the other, lasting for about 10ms. I specifically wanted to find out one thing:

      Would the 10ms gap of silence, affect what I heard on the ear that did not receive the gap of silence?

      I actually found two observations:

      1) I could not discern to begin with, which tracks were in-phase, and which were out-of-phase.

      2) The moment of silence in one track, either way, had no effect on what I heard coming from the other track.

      Further, I had a final track with the two sine-waves at a 90⁰ phase-shift. And when listening to the recording, I only found that track to be particularly uncomfortable to listen to, but was unable to hear what was causing the discomfort.

      So in the case of a continuous sine-wave, with a phase-shift, directly to the headphones, the IAD doesn’t seem to play a role. But, ‘Sounds’ as such can’t be counted. How would you explain that?

      We can listen to streams being played to our ears, and our Human interpretation of those streams will tell us that they consist of recognizable ‘events’. But according to pure Physics, each stream is just a stream. Whatever our brains have the ability to recognize, needed to start, with the way our ears process continuous streams.


    3. My basis of thinking on this subject goes a little further. My own ability to hear finely-tuned, ‘Classical-style’ Music, as in Beethoven’s 9th, Mozart, etc., is so poor, that it cannot offer much help. A lot of that was actually Renaissance -era Music. But I once knew a person, who was also a friend of mine, who would turn on some music of that sort, which was not compressed but in the best of stereophonic sound. And he was able to hear the positioning of instrument-groups in the orchestra, due to the camber of the sound. So it would seem, that rather than rely on temporal details, he could rely on rich spectral details of sound, which was temporally uniform, to make out positioning.

      This same person told me that when listening to MP3-compresed sound, he could not hear this, but could only hear one-dimensional positioning, left-to-right, as well as being able to hear ‘from the outside’ as opposed to ‘from the inside’.

      There was an experiment which I always wanted to carry out with him, which would have been, to compress some orchestral music, namely, Beethoven’s 9th, as it was recorded by the Berliner Philharmoniker, directed by Carlo Maria Guilini, in 1990, of which I have an uncompressed, Audio CD, but to compress it according to my own parameters, using OGG-Vorbis at a high bit-rate (less compression). The quality of this CD exceeds my own ability to appreciate.

      I wanted to know whether my friend would be able to hear the orchestration better, than what he was complaining about, from MP3 compression.

      But unfortunately, my friend passed away, before I was able to try this with him.

      He would first have required some form of OGG-player, of matching quality, as this 80-year-old man did not possess anything newer than his ($2000.-) stereo-set. And I had gotten stuck on the question, of what sort of OGG-player I might best have gifted him, and diplomatically, how so. I fear he would have felt insulted by such a gift.

      It was my own devious nature to know, that if I only FLAC-compressed the CD, which was feasible at the time, he would certainly have obtained all the quality, since FLAC is lossless. But I would have OGG-compressed the music anyway, just to gain some more info myself, on what types of spectral damage, affect our ability to hear directionally.

      I think that the availability of High-Quality FLAC-players today, is much better than it was, even at the time of the posting.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>