When Audacity Down-Samples a Track

In This Posting, the reader may have seen me struggle to interpret, what the application ‘QTractor‘ actually does, when told to re-sample a 44.1 kHz audio clip, into a 48 kHz audio clip. The conclusion I reached was that at maximum, the source track can be over-sampled 4x, after which the maximum frequencies are also much lower than the Nyquist Frequency, so that if a Polynomial Filter is applied to pick out points sampled at 48 kHz, minimum distortion will take place.

If the subject is instead, how the application ‘Audacity‘ down-samples a 48 kHz clip into a 44.1 kHz clip, the problem is not the same. Because the Nyquist Frequency of the target sample-rate is then lower than that of the source, it follows that frequencies belong to the source, which will be too high for that. And so an explicit attempt must be made to get rid of those frequency components.

The reason Audacity is capable of that, is the fact that a part of its framework causes a Fourier Transform to be computed for each track, with which that track is also subdivided into overlapping sampling windows. The necessary manipulation can also be performed on the Fourier Transform, which can then be inverted and merged back into a resulting track in the time-domain.

So for Audacity just to remove certain frequency ranges, before actually re-sampling the track, is trivial.

If my assumption is, that QTractor does not have this as part of its framework, then perhaps it would be best for this application only to offer to re-sample from 44.1 kHz to 48 kHz, and not the other way around…

Dirk

 

A Note on Sample-Rate Conversion Filters

One type of (low-pass) filter which I had learned about some time ago, is a Sinc Filter. And by now, I have forgiven the audio industry, for placing the cutoff frequencies of various sinc filters, directly equal to a relevant Nyquist Frequency. Apparently, it does not bother them that a sinc filter will pass the cutoff frequency itself, at an amplitude of 1/2, and that therefore a sampled audio stream can result, with signal energy directly at its Nyquist Frequency.

There are more details about sinc filters to know, that are relevant to the Digital Audio Workstation named ‘QTractor‘, as well as to other DAWs. Apparently, if we want to resample an audio stream from 44.1 kHz to 48 kHz, in theory this corresponds to a “Rational” filter of 147:160, which means that if our Low-Pass Filter is supposed to be a sinc filter, it would need to have 160 * (n) coefficients in order to work ideally.

But, since no audio experts are usually serious about devising such a filter, what they will try next in such a case, is just to oversample the original stream by some reasonable factor, such as by a factor of 4 or 8, then to apply the sinc filter to this sample-rate, and after that to achieve a down-sampling, by just picking samples out, the sample-numbers of which have been rounded down. This is also referred to as an “Arbitrary Sample-Rate Conversion”.

Because 1 oversampled interval then corresponds to only 1/4 or 1/8 the real sampling interval of the source, the artifacts can be reduced in this way. Yet, this use of a sinc filter is known to produce some loss of accuracy, due to the oversampling, which sets a limit in quality.

Now, I have read that a type of filter also exists, which is called a “Farrow Filter”. But personally, I know nothing about Farrow Filters.

As an alternative to cherry-picking samples in rounded-down positions, it is possible to perform a polynomial smoothing of the oversampled stream (after applying a sinc filter if set to the highest quality), and then to ‘pick’ points along the (now continuous) polynomial that correspond to the output sampling rate. This can be simplified into a system of linear equations, where the exponents of the input-stream positions conversely become the constants, multipliers of which reflect the input stream. At some computational penalty, it should be possible to reduce output artifacts greatly.

Continue reading A Note on Sample-Rate Conversion Filters

Audio Disks Disk-At-Once

I have been taking a trip down memory lane, into the subject of Audio CDs, which some people today do not even recognize. And one type of Audio CD which users were once able to record, were ‘DAO’, or, “Disk-At-Once” recordings.

I should just explain what that means. Similarly to the old vinyl records, the first Audio CDs produced by big companies, had a single track from beginning to end. This track wound in a dense coil, in the middle of each Music Track, producing a surface on the CD that was noticeably matte. But between music tracks, this optical track was much less densely-wound. There would be a gap on the surface of the disk, which was less than a millimeter wide, which was more reflective, and which the early CD players used, when commanded by the listener to ‘skip in’ to Track 5, let us say. But, if playback was already underway on Track 4, near the end of Track 4, there would be ‘pre-gap’ audio played, as the playback continued into Track 5. Skipping in to Track 5 would bypass this introductory, pre-gap audio.

The first CD-Rs burned by PC users, were not Disk-At-Once, but rather had lasers which would switch off between tracks, thus leaving those non-continuous. This is now referred to as ‘TAO’. But the modern hardware and software seeks to mimic, what the big companies were able to do, by providing DAO Audio Disks.

I am exploring how I would need to use K3b to do all this.

There has been some misconception on public forums, as evidenced by users asking, ‘Why K3b does not insert silence, if’ on earlier versions of this application, ‘they had set the pre-gap time to 2 seconds.’ ( :1 )

That pre-gap time was never meant to insert any silence. In the above example, all it meant was that the first 2 seconds of Track 5 were supposed to hold the pre-gap, as supplied by the user who is acting as artist. It would also tell the CD Player, to start the counter at -2 seconds, instead of at 0, when playing through. If the artist wanted for there to be 2 seconds of silence, the responsibility would still have been his, to make sure each of his Audio Tracks began with 2 seconds of silence.

The first 2 seconds of any Audio Track so programmed, would again be skipped by the CD player, when the player was instructed to skip ahead to said track.

What the developers of K3b soon realized, was that most people have Audio Tracks, on which the songs start immediately. And so in some later version, they designed K3b so that the ‘post-gap‘ of the current Track could be be set to something other than 0, if there was a following Track.

K3b-Post-Gap-1

Thus, instead of having a pre-gap timing on Track 5 set to -2 seconds, it is now possible to have a post-gap timing for Track 4, set to 2 seconds, which does the same thing which an assumed pre-gap on Track 5 used to do.

The post-gap setting on Track 4 will now tell the timer of the CD player to stop counting time belonging to Track 4, and to jump back to -2 seconds, as continuous playback continues, within the last 2 seconds of Track 4, and also to continue at the beginning of Track 5, by which time the timer should have reached 0.

So now, if the artist wants for there to be an actual 2 seconds of silence, he would need to edit into the last 2 seconds of Track 4, before adding Track 4 to his project. Also, with a post-gap timing set for Track 4, if the listener decides to skip in to Track 5 on his player, he will no longer be directed to 2 seconds, inside Track 5. (Never was.) Instead, his player will now take him directly to the exact beginning of Track 5, because the logical pre-gap of Track 5, is now the post-gap of Track 4, according to the new system. (Always was.)

The purpose in doing this, is that the gap between the tracks could be something other than silence. For example, specifically between Track 4 and 5, it could be the intention of the artist, to put a 10-second effect, which will not play if the CD player is advanced directly to Track 5. In this case, the artist would need to edit this effect into the ending of Track 4, and set the ‘post-gap’ of Track 4 to 10 seconds

Dirk

(Edit 08/07/2016 : ) If it is honestly the intention of the author, to have a sound-effect acting as an intro to the following Audio Track, that plays for more than ?4 seconds? , There would be nothing preventing him from editing that into the ending of the preceding track, but still to leave the post-gap set to 2 seconds.

Also, authors have run in to the problem from time to time, that they hear a popping or crackling sound when they try to create a DAO Disk. This could be due to some malfunction of their disk-authoring software, or of their CD-R drive. But there can also be a more innocent cause for this.

The fact that the audio streams supplied to the authoring software are digital, does not preclude the possibility that they could be storing some type of DC (Direct-Current) Offset. Most disk-authoring software makes no attempt to smooth the transition from one Track to the Next. I.e., we might be tempted to think, that because the amplitude of the signal could be zero at the moment of transition, the last sample of Track 4 in my example, should also be exactly equal to the first sample of Track 5. Well they may not be equal, and if they are not, this will also produce a crackling or popping effect.

In the case of silence, audio editing software that computes DC offset, which is nothing but the average of all the samples of a Track, may offer to zero that, let us say as part of a normalization step (Audacity example shown).

K3b-Post-Gap-2

In case music or sound should play continuously over the transition / gap, the only way really to make sure that 2 successive samples match, is to start with a single Track, and to Split that, and to do no further editing of the resulting, shorter Tracks.

And yet, this is also a possible situation where either the software or the CD-R drive may not cooperate.

(Note : ) If all you are looking for is an easy way to insert 2 extra seconds of silence between your songs, the best suggestion I would come up with, would be to use a sound editor such as ‘Audacity‘, to create a single 2-second sound track, which has been formatted correctly, and which contains silence.

Then, within ‘K3b’, If for example you had 10 Audio Tracks to start with, you could insert your silence Track as the even-numbered Tracks 2-18, such that the original music Tracks become the odd numbered ones from 1-19. Then, you can use the context menus within K3b itself, to merge each of these odd-numbered tracks with the even-numbered one which follows it, so that you have 10 Audio Tracks again. Tada.

1: ) ( 06/12/2016 ) I should really not be so presumptive. There can easily be other disk-burning applications, which do insert those 2 seconds of silence. Further, ‘K3b’ versions earlier than 0.12 also used to do so. K3b version 0.12 started to change the behavior of the ‘pre-gap’, but it was still referred to as a pre-gap. The version of K3b which I now have, which I am basing this posting on, is version 2.0.2 . And it was one of the more recent versions such as 2.0.2 , which started to reorganize the pre-gap as the ‘post-gap’.

(Edit 08/07/2016 : ) I mentioned the earliest CD Players, which needed an actual gap on the disk which was more reflective, to recognize a track-switch in commercially-made CDs. Well, one reason for which the earliest DAO disks had a pre-gap, pertained to the fact that the medium itself had grooves, with a constant spacing. Users might imagine that their disk-burners are able to work very autonomously, but in fact they require that the surfaces have depressions – or other patterns – manufactured into them, into which they burn their tracks.

I think that with the earliest software, TAO disks may have actually been recorded in such a way, that the player would still see them as being ‘more reflective’, but this would be because the disk had spent numerous revolutions, with no content in that groove.

What any more recent player will do, when instructed to skip ahead to ‘Track 5′, is compute approximately how many turns into the disk this will be found, and then start reading the track…

A certain bit encoded into the track actually signifies that it belongs to the pre-gap, and that as such it still belongs to the track preceding the one sought (belonging to #4). Playback will then not start, until this bit becomes zeroes.

But when playing through from ‘Track 4′ in such a case, the other bits are still allowed to contain audio, assuming modern players.

 

Some Thoughts on Surround Sound

The way I seem to understand modern 5.1 Surround Sound, there exists a complete stereo signal, which for the sake of legacy compatibility, is still played directly to the front-left and the front-right speaker. But what also happens, is that a third signal is picked up, which acts as the surround channel, in a way that neither favors the left nor the right asymmetrically.

I.e., if people were to try to record this surround channel as being a sideways-facing microphone component, by its nature its positive signal would either favor the left or the right channel, and this would not count as a correct surround-sound mike. In fact, such an arrangement can best be used to synthesize stereo, out of geometries which do not really favor two separate mikes, one for left and one for right.

But, a single, downward-facing, HQ mike would do as a provider of surround information.

If the task becomes, to carry out a stereo mix-down of a surround signal, this third channel is first phase-shifted 90 degrees, and then added differentially between the left and right channels, so that it will interfere least with stereo sound.

In the case where such a mixed-down, analog stereo signal needs to be decoded into multi-speaker surround again, the main component of “Pro Logic” does a balanced summation of the left and right channels, producing the center channel, but at the same time a subtraction is carried out, which is sent rearward.

The advantage which Pro Logic II has over I, is that this summation first adjusts the relative gain of both input channels, so that the front-center channel has zero correlation with the rearward surround information, which has presumably been recovered from the adjusted stereo as well.

Now, an astute reader will recognize, that if the surround-sound thus recovered, was ‘positive facing left’, its addition to the front-left signal will produce the rear-left signal favorably. But then the thought could come up, ‘How does this also derive a rear-right channel?’ The reason for which this question can arise, is the fact that a subtraction has taken place within the Pro Logic decoder, which is either positive when the left channel is more so, or positive when the right channel is more so.

(Edit 02/15/2017 : The less trivial answer to this question is, A convention might exist, by which the left stereo channel was always encoded as delayed 90 degrees, while the right could always be advanced, so that a subsequent 90 degree phase-shift when decoding the surround signal can bring it back to its original polarity, so that it can be mixed with the rear left and right speaker outputs again. The same could be achieved, if the standard stated, that the right stereo channel was always encoded as phase-delayed.

However, the obvious conclusion of that would be, that if the mixed-down signal was simply listened to as legacy stereo, it would seem strangely asymmetrical, which we can observe does not happen.

I believe that when decoding Pro Logic, the recovered Surround component is inverted when it is applied to one of the two Rear speakers. )

But what the reader may already have noticed, is that if he or she simply encodes his mixed-down stereo into an MP3 File, later attempts to use a Pro Logic decoder are for not, and that some better means must exist to encode surround-sound onto DVDs or otherwise, into compressed streams.

Well, because I have exhausted my search for any way to preserve the phase-accuracy, at least within highly-compressed streams, the only way in which this happens, which makes any sense to me, is if in addition to the ‘joint stereo’, which provides two channels, a 3rd channel was multiplexed into the compressed stream, which as before, has its own set of constraints, for compression and expansion. These constraints can again minimize the added bit-rate needed, let us say because the highest frequencies are not thought to contribute much to human directional hearing…

(Edit 02/15/2017 :

Now, if a computer decodes such a signal, and recognizes that its sound card is only in  stereo, the actual player-application may do a stereo mix-down as described above, in hopes that the user has a pro Logic II -capable speaker amp. But otherwise, if the software recognizes that it has 4.1 or 5.1 channels as output, it can do the reconstruction of the additional speaker-channels in software, better than Pro Logic I did it.

I think that the default behavior of the AC3 codec when decoding, if the output is only specified to consist of 2 channels, is to output legacy stereo only.

The approach that some software might take, is simply to put two stages in sequence: First, AC3 decoding with 6 output channels, Secondly, mixing down the resulting stereo in a standard way, such as with a fixed matrix. This might not be as good for movie-sound, but would be best for music.

 


 1.0   0.0
 0.0   1.0
 0.5   0.5
 0.5   0.5
+0.5  -0.5
-0.5  +0.5

 

If we expected our software to do the steering, then we might also expect, that software do the 90° phase-shift, in the time-domain, rather than in the frequency-domain. And this option is really not feasible in a real-time context.

The AC3 codec itself would need to be capable of 6-channel output. There is really no blind guarantee, that a 6-channel signal is communicated from the codec to the sound system, through an unknown player application... )

(Edit 02/15/2017 : One note which should be made on this subject, is that the type of matrix which I suggested above might work for Pro Logic decoding of the stereo, but that if it does, it will not be heard correctly on headphones.

The separate subject exists, of ‘Headphone Spacialization’, and I think this has become relevant in modern times.

A matrix approach to Headphone Spacialization would assume that the 4 elements of the output vector, are different from the ones above. For example, each of the crossed-over components might be subject to some fixed time-delay, which is based on the Inter-Aural Delay, after it is output from the matrix, instead of awaiting a phase-shift… )

(Edit 03/06/2017 : After much thought, I have come to the conclusion that there must exist two forms of the Surround channel, which are mutually-exclusive.

There can exist a differential form of the channel, which can be phase-shifted 90⁰ and added differentially to the stereo.

And there can exist a common-mode, non-differential form of it, which either correlates more with the Left stereo or with the Right stereo.

For analog Surround – aka Pro Logic – the differential form of the Surround channel would be used, as it would for compressed files.

But when an all-in-one surround-mike is implemented on a camcorder, this originally provides a common-mode Surround-channel. And then it would be up to the audio system of the camcorder, to provide steering, according to which this channel either correlates more with the front-left or the front-right. As a result of that, a differential surround channel can be derived. )

(Updated 11/20/2017 : )

Continue reading Some Thoughts on Surround Sound