## A Word Of Compliment To Audacity

One of the open-source applications which can be used as a Sound-Editor, is named ‘Audacity’. And in an earlier posting, I had written that this application may apply certain effects, which first involve performing a Fourier Transform of some sort on sampling-windows, which then manipulate the frequency-coefficients, and which then invert the Fourier Transform, to result in time-domain sound samples again.

On closer inspection of Audacity, I’ve recently come to realize that its programmers have avoided going that route, as often as possible. They may have designed effects which sound more natural as a result, but which follow how traditional analog methods used to process sound.

In some places, this has actually led to criticism of Audacity, let’s say because the users have discovered, that a low-pass or a high-pass filter would not maintain phase-constancy. But in traditional audio work, low-pass or high-pass filters always used to introduce phase-shifts. Audacity simply brings this into the digital realm.

I just seem to be remembering certain other sound editors, that used the Fourier Transforms extensively.

Dirk

## A Practical Application, that calls for A Uniform Phase-Shift: SSB Modulation

A concept that exists in radio-communications, which is derived from amplitude-modulation, and which is further derived from balanced modulation, is single-sideband modulation. And even back in the 1970s, this concept existed. Its earliest implementations required that a low-frequency signal be passed to a balanced modulator, which in turn would have the effect of producing an upper sideband (the USB) as well as an inverted lower sideband (the LSB), but zero carrier-energy. Next, the brute-force approach to achieving SSB entailed, using a radio-frequency filter to separate either the USB or the LSB.

The mere encumbrance of such high-frequency filters, especially if this method is to be used at RF frequencies higher than the frequencies, of the old ‘CB Radio’ sets, sent Engineers looking for a better approach to obtaining SSB modulation and demodulation.

And one approach that existed since the onset of SSB, was actually to operate two balanced modulators, in a scheme where one balanced modulator would modulate the original LF signal. The second balanced modulator would be fed an LF signal which had been phase-delayed 90⁰, as well as a carrier, which had either been given a +90⁰ or a -90⁰ phase-shift, with respect to whatever the first balanced modulator was being fed.

The concept that was being exploited here, is that in the USB, where the frequencies add, the phase-shifts also add, while in the LSB, where the frequencies subtract, the phase-shifts also subtract. Thus, when the outputs of the two modulators were mixed, one side-band would be in-phase, while the other would be 180⁰ out-of-phase. If the carrier had been given a +90⁰ phase-shift, then the LSB would end up 180⁰ out-of-phase – and cancel, while if the carrier had been given a -90⁰ phase-shift, the USB would end up 180⁰ out-of-phase – and cancel.

This idea hinges on one ability: To phase-shift an audio-frequency signal, spanning several octaves, so that a uniform phase-shift results, but also so that the amplitude of the derived signal be consistent over the required frequency-band. The audio signal could be filtered to reduce the number of octaves that need to be phase-shifted, but then it would need to be filtered to achieve a constrained frequency-range, before being used twice.

And so a question can arise, as to how this was achieved historically, given analog filters.

My best guess would be, that a stage which was used, involved a high-pass and a low-pass filter that acted in parallel, and which would have the same corner-frequency, the outputs of which were subtracted – with the high-pass filter negative, for -90⁰ . At the corner-frequency, the phase-shifts would have been +/- 45⁰. This stage would achieve approximately uniform amplitude-response, as well as achieving its ideal phase-shift of -90⁰ at the one center-frequency. However, this would also imply that the stage reaches -180⁰ (full inversion) at higher frequencies, because there, the high-pass component that takes over, is still being subtracted !

( … ? … )

What can in fact be done, is that a multi-band signal can be fed to a bank of 2nd-order band-pass filters, spaced 1 octave apart. The fact that the original signal can be reconstructed from their output, derives partially from the fact that at one center-frequency, an attenuated version is also passed through one-filter-up, with a phase-shift of +90⁰ , and a matching attenuated version of that signal also passed through one-filter-down, with a phase-shift of -90⁰. This means that the two vestigial signals that pass through the adjacent filters are at +/- 180⁰ with respect to each other, and cancel out, at the present center-frequency.

If the output from each band-pass filter was phase-shifted, this would need to take place in a way not frequency-dependent. And so it might seem to make sense to put an integrator at the output of each bp-filter, the time-constant of which is to achieve unit gain, that the center-frequency of that band. But what I also know, is that doing so will deform the actual frequency-response of the amplitudes, coming from the one band. What I do not know, is whether this blends well with the other bands.

If this was even to produce a semi-uniform -45⁰ shift, then the next thing to do, would be to subtract the original input-signal from the combined output.

(Edit 11/30/2017 :

It’s important to note, that the type of filter I’m contemplating does not fully achieve a phase-shift of +/- 90⁰ , at +/- 1 octave. This is just a simplification which I use to help me understand filters. According to my most recent calculation, this type only achieves a phase-shift of +/- 74⁰ , when the signal is +/- 1 octave from its center-frequency. )

Now, my main thought recently has been, if and how this problem could be solved digitally. The application could still exist, that many SSB signals are to be packed into some very high, microwave frequency-band, and that the type of filter which will not work, would be a filter that separates one audible-frequency sideband, out of the range of such high frequencies.

And as my earlier posting might suggest, the main problem I’d see, is that the discretized versions of the low-pass and high-pass filters that are available to digital technology in real-time, become unpredictable both in their frequency-response, and in their phase-shifts, close to the Nyquist Frequency. And hypothetically, the only solution that I could see to that problem would be, that the audio-frequency band would need to be oversampled first, at least 2x, so that the discretized filters become well-behaved enough, to be used in such a context. Then, the corner-frequencies of each, will actually be at 1/2 Nyquist Frequency and lower, where their behavior will start to become acceptable.

The reality of modern technology could well be such, that the need for this technique no longer exists. For example, a Quadrature Mirror Filter could be used instead, to achieve a number of side-bands that is a power of two, the sense with which each side-band would either be inverted or not inverted could be made arbitrary, and instead of achieving 2^n sub-bands at once, the QMF could just as easily be optimized, to target one specific sub-band at a time.

## A Note on FLAC -Compressing 24-bit

One note which I had commented about before my blog began, was that if authors decide to capture sound at 96k samples /second, the resulting sound should compress well using FLAC.

But now that I have experimented with ‘QTractor‘ and an external sound card, I have realized that we will probably also be capturing that sound in 24-bit sample-format, instead of 16-bit. And the sad fact is, that FLAC will not compress the 24-bit format as well, as it did 16-bit.

The reason seems clear. Using ‘Linear Predictive Coding’ means that FLAC will be able to predict the next sample in a set of so-many, to maybe 8 bits of precision, except that the next sample will always deviate from this prediction by a small residual. So 8-bit sound should compress brilliantly.

But then with 16-bit, the accuracy of the encoding stays the same. So again, the ‘LPC’ is really only 8-bits accurate at best, meaning that we get a larger residual. The size of that residual is what makes up most of a FLAC File.

Well at 24-bit, again, the LPC will only predict the next sample, accurately to within 8 bits. And so the residual is likely to be twice as large, as it was with 16-bit, completing 24-bit accuracy this time. We are not left with much compression then.

When I recorded my 14-second sound session the other day, I selected FLAC as my capture file format. I had a noisy air-conditioner running in the background. Additionally, the compression level defaults to Fastest, because the file needs to be written in real-time, and not chewed on.

At 96 kHz, 24-bit stereo, raw audio will take up about 4.6 mbps. At 44.1 kHz, 16-bit stereo, raw audio takes up about 1.4 mbps.

Well I was capturing to a stereo FLAC File, but was only using one channel out of the two. So the FLAC File that resulted, had a bit-rate of 2.3 mbps. This means that FLAC recognized the silent track and used ‘Run-Length Encoding’ on it, but that was about all this CODEC could do for me.

Now, we do have a command-line tool which will-re-compress that file:


$flac -8 infile.flac -o outfile.flac$ flac -8 infile.flac --channels=1 -o outfile.flac
\$ flac -8 infile.flac --channels=1 --blocksize=8192 -o outfile.flac



The -8 means to use maximum compression.

For me, the bit-rate went down to 2.2 mbps either way.

It beats using a raw format, because using the latter would have meant, nothing would have detected my silent stereo channel, and the file would have been twice as large.

Dirk

## Testing of USB Sound Device Complete.

According to my previous posting, I needed to do a more thorough test of the USB Sound Card I have bought, which is a “Focusrite Scarlett 2i2“.

In particular, I needed to address the discrepancy according to which, the Linux JACK daemon reports capture at 32 bits, while the specifications of the sound card state a 24 bit sample format.

Also, I needed to be sure whether it would run as well at 96 kHz, as it already did at 48 kHz.

According to my more complete test, the 32-bit sample-format which ‘QJackCtl‘ shows me, which can be viewed in its Messages box, state the ALSA parameters and not the JACK internals. Therefore, JACK has after all chosen to capture and/or play back audio at a physical 32 bits, at the 96 kHz sample-rate. This is not, after all, a statement of the JACK internal behavior.

Since I am using Linux, and since the manufacturer chose to rate this capture device as only being capable of 24-bit capture, I must assume that for hardware reasons the device uses 32-bit registers, but that only the first, most-significant 24 of those bits are accurate. Therefore, when I open ‘QTractor‘ – the Digital Audio Workstation / Tracker application, it is best to truncate its capture format to 24 bits as well, which is most probably what the Windows or Mac drivers for this device do.

Aside from that, using QTractor next, to capture a 96 kHz, 24-bit, stereo FLAC file was easy and uneventful. Further, the stability of my software suggests that I can play with the GUIs as much as I need to, to figure them out, and I will not screw anything up.

After I closed JACK, I next imported this FLAC file, that plays for 14 seconds, into “Audacity“, which has been set up to use the default sound settings (‘PulseAudio‘), and which performs an on-demand re-sampling of the FLAC file.

The on-demand FLAC playback is not filtered well by Audacity, but since it is running at 96 kHz, compared with the 44.1 kHz that the internal sound of the laptop runs at, this observation is not surprising.

And then the captured sound clip simply contains, what I spoke into my microphone.

Dirk