A Word Of Compliment To Audacity

One of the open-source applications which can be used as a Sound-Editor, is named ‘Audacity’. And in an earlier posting, I had written that this application may apply certain effects, which first involve performing a Fourier Transform of some sort on sampling-windows, which then manipulate the frequency-coefficients, and which then invert the Fourier Transform, to result in time-domain sound samples again.

On closer inspection of Audacity, I’ve recently come to realize that its programmers have avoided going that route, as often as possible. They may have designed effects which sound more natural as a result, but which follow how traditional analog methods used to process sound.

In some places, this has actually led to criticism of Audacity, let’s say because the users have discovered, that a low-pass or a high-pass filter would not maintain phase-constancy. But in traditional audio work, low-pass or high-pass filters always used to introduce phase-shifts. Audacity simply brings this into the digital realm.

I just seem to be remembering certain other sound editors, that used the Fourier Transforms extensively.



An Observation about Modifying Fourier Transforms

A concept which seems to exist, is that certain standard Fourier Transforms do not produce desired results, and that therefore, They must be modified for use with compressed sound.

What I have noticed is that often, when we modify a Fourier Transform, it only produces a special case of an existing standard Transform.

For example, we may start with a Type 4 Discrete Cosine Transform, that has a sampling interval of 576 elements, but want it to overlap 50%, therefore wanting to double the length of samples taken in, without doubling the number of Frequency-Domain samples output. One way to accomplish that is to adhere to the standard Math, but just to extend the array of input samples, and to allow the reference-waves to continue into the extension of the sampling interval, at unchanged frequencies.

Because the Type 4 applies a half-sample shift to its output elements as well as to its input elements, this is really equivalent to what we would obtain, if we were to compute a Type 2 Discrete Cosine Transform over a sampling interval of 1152 elements, but if we were only to keep the odd-numbered coefficients. All the output elements would count as odd-numbered ones then, after their index is doubled.

The only new information I really have on Frequency-Based sound-compression, is that there is an advantage gained, in storing the sign of each coefficient, notwithstanding.

(Edit 08/07/2017 : )

Continue reading An Observation about Modifying Fourier Transforms

A single time-delay can also be expressed in the frequency-domain.

Another way to state, that a stream of time-domain samples has been given a time-delay, is simply to state that each frequency-coefficient has been given a phase-shift, that depends both on the frequency of the coefficient, and on the intended time-delay.

A concern that some readers might have with this, is the fact that a number of samples need to be stored, in order for a time-delay to be executed in the time-domain. But as soon as differing values for coefficients, for a Fourier Transform, are spaced closer together, indicating in this case a longer time-delay, its computation also requires that a longer interval of samples in the time-domain need to be combined.

Now, if the reader would like to visualize what this would look like, as a homology to a graphical equalizer, then he would need to imagine a graphical equalizer the sliders of which can be made negative – i.e. one that can command, that one frequency come out inverted – so that then, if he was to set his sliders into the accurate shape of a sine-wave that goes both positive and negative in its settings, he should obtain a simple time-delay.

But there is one more reason for which this homology would be flawed. The type of Fourier Transform which is best-suited for this, would be the Discrete Fourier Transform, not one of the Discrete Cosine Transforms. The reason is the fact that the DFT accepts complex numbers as its terms. And so the reader would also have to imagine, that his equalizer not only have sliders that move up and down, but sliders with little wheels on them, from which he can give a phase-shift to one frequency, without changing its amplitude. Obviously graphical equalizers for music are not made that way.

Continue reading A single time-delay can also be expressed in the frequency-domain.

A Concept about Directionality In Sound Perception

We all understand that given two ears, we can hear panning when we listen to reproduced stereo, as well as maybe that sounds seem to come ‘from outside’ as opposed to ‘from inside’, corresponding to out-of-phase as opposed to in-phase. But the reality of human sound perception is, that we are supposed to be capable of more subtle perception, about the location of the origin of sounds. I will call this more subtle perception of directions, ‘complete stereo-directionality’.

One idea which some people have pursued, is that we do not just hear amplitudes associated with frequencies, but that we might be able to perceive phase-vectors associated with frequencies as well. This idea seems to agree with the fact that at least a part of our complete stereo-directionality seems to be based on Inter-Aural-Time-Differences, as a basis for perceiving direction. This idea also seems to agree well with the fact that in Science, and with Machines, the amplitude of any frequency component, can be represented by a complex number.

But this idea does not seem to agree well, with the fact that our ultimate organ to perceive sound is not the outer ear, nor the middle ear, but the inner ear, which is also known as the cochlea. As I understand it, the cochlea is capable of differentiating along frequency-mappings incredibly precisely, but not along phase-relationships.

Now, some reason may exist to think, that the middle ear and the skull carry out some sort of mixing of sounds, that enter the outer ear, before those sounds reach the cochlea. But for the moment, I am going to regard this detail as secondary.

I think that what ultimately happens, is that on the cerebral cortex, just as it goes with the optical lobes, the aural lobes have a mapping of fingerprint-like ‘ridges’. The long-range mapping may be according to frequency, but the short-range mapping may be such, that one set of ridges corresponds to input from one ear, while the negative of that same pattern of ridges, represents the input of the opposite ear.

And so what the cerebral cortex can do, is make very precise differentiations in its short-range neural systems, between what any one frequency-component has as amplitude, as perceived by one cochlea differently from the other cochlea.

When sound events reach our ears, they can follow many paths, as well as perhaps being mixed as well by our middle ear, so that real phase positions lead to subtle amplitude-differences, as sensed by our cochlea, and as interpreted by our cerebral cortex with its ridged mappings. Inter-Aural Time-Differences may also lead to subtle differences in per-frequency amplitudes, by the time they reach the cochlea.

And I suspect that the latter is what leads to our ‘complete stereo-directionality’.

What this would also mean, is that in lossy sound compression, if the programmers decided to compute a Fourier Transform of each stereo channel first – and the Discreet Cosine Transform is one type of Fourier Transform – and then to store the differences between absolute amplitudes that result, they may quite accidentally have processed the sound closer to how human hearing processes sound.

If instead, the programmers chose to compute the L-R component in the time-domain first, and then to perform some Fourier Transform of L+R and L-R secondly, they may have been intending to capture more information than can be captured in the other way. But they may have captured information with this method, that human hearing is not able to interpret well.

This would be especially true then, in cases where L and R mainly cancel, so that the amplitude of L+R is low, while the Fourier Amplitude of L-R would be high.

This might sound fascinating due to whatever our middle ear next does with it, but does not lead to meaningful interpretations, of ‘where that sound even supposedly comes from’. Hence, while this could be psychedelic, it would not enhance our ‘complete stereo-directionality’.

Also, the idea may be applied by our brain, that whatever sound we are focusing on, ‘all the other sounds’ form a continuous background noise, such that the sound we are focusing on may seem to have negative amplitudes, because real amplitudes locally become lower than the virtual noise levels. And while this may allow us to derive some sort of perception of phase-cancellation, it may not actually be due, to our cochlea having picked up phase-cancellation.