Caveats when using ‘avidemux’ (under Linux).

One of the applications available under Linux, that can help edit video / audio streams, and that has been popular for many years – if not decades – is called ‘avidemux’. But in recent years the subject has become questionable, of whether this GUI- and command-line- based application is still useful.

One of the rather opaque questions surrounding its main use, is simply highlighted in The official usage WiKi for Avidemux. The task can befall a Linux user, that he either wants to split off the audio track from a video / audio stream, or that he wants to give the stream a new audio track. What the user is expected to do is to navigate to the Menu entries ‘Audio -> Save Audio’, or to ‘Audio -> Select Track’, respectively.

Screenshot_20190314_131933

Screenshot_20190314_132014

 

 

What makes the usage of the GUI not straightforward is what the manual entries next state, and what my personal experiments confirm:

  • External Audio can only be added from ‘AC3′, ‘MP3′, or ‘WAV’ streams by default,
  • The audio track that gets Saved cannot be played back, if In the format of an ‘OGG Vorbis’, an ‘OGG Opus’, or an ‘AAC’ track, as such exported audio tracks lack any header information, which playback apps would need, to be able to play them. In those cases specifically, only the raw bit-stream is saved.

The first problem with this sort of application is that the user needs to perform a memorization exercise, about which non-matching formats he may or may not, Export To or Import From. I don’t like to have to memorize meaningless details, about every GUI-application I have, and in this case the details can only be read through detailed research on the Web. They are not hinted at anywhere within the application.

(Updated 3/23/2019, 15h35 … )

(As of 3/14/2019, 13h30 : )

As it happens, if we subscribe to the Debian / Stretch, Multimedia Repository, then the version of ‘avidemux’ that we can install from the package manager, is already version ‘2.6.20’. What this means is, that version has added support for the ‘AAC’ audio formats, as external files. For me personally, this reduces the amount of memorization I need to do. I just need to remember that I cannot Export or Import, either of the two ‘OGG’ formats, those being ‘Vorbis’ or ‘Opus’.

Simultaneously, the pattern of which audio formats to import, follows the generalized idea, that ‘AC3, AAC, MP3, and PCM (WAV)’ formats, are already associated with ‘MPEG-2, or MP4′ video / audio container formats, according to a specific pattern, and that their choice has some sort of logical significance.

Well these days, if I wanted to split the audio from a video stream, and export it to OGG Vorbis format, there is an ‘ffmpeg’ command-line which already works:

 


ffmpeg -i Sample.avi -vn -ar 44100 -ac 2 -ab 192k -acodec libvorbis Sample.ogg

 

Because this command works, I don’t need a GUI to accomplish the same thing.

OTOH, the modern versions of ‘avidemux’ seem to have as a redeeming feature, that instead of only Saving the video / audio stream to the .AVI container file format, they can also save to other formats now, such as to MP4. Only, If the user chooses that, then he must also decide to convert both the video and the audio to ‘H-264′ and to ‘AAC’ format, out of the personal knowledge that these are the formats used in MP4-Files.

I still question, to what degree this GUI-application is still useful today.

 


 

N.B.

What some people have suggested is that, although ‘ffmpeg’ is a good Swiss-Army-Knife – that covers many formats – its implementation of OGG Vorbis, Audio Encoding is particularly low in sound quality. I don’t really know whether this only applies to ‘Linux, the way it existed in the Year 2000′, or whether it still applies to ‘Linux Today’. But in case this still applies, a better way to conserve the quality of the generated OGG File would be:

 


ffmpeg -i Sample.avi -vn -ar 44100 -ac 2 -filter_type kaiser -kaiser_beta 2.0 -filter_size 64 -phase_shift 14 -cutoff 0.952 -acodec pcm_s16le Sample.wav
oggenc -q 8 -o Sample.ogg Sample.wav

 

This assumes that ‘oggenc’ is an installed command.

(Update 3/14/2019, 23h50:)

In case the reader is curious, ‘ffmpeg’ is a suitable tool for resampling audio because it possesses a low-pass filter, for any sample-rate conversions. By default, it uses ‘libswresample’. Its default behaviour is, to apply 32 sample-weights to the input-stream. Above, I have increased this to 64 weights, to achieve sharper frequency-cutoff. I take this to mean that there is 4x super-sampling, and if there was no actual conversion of the sample-rate, there would be 8 computed zero-crossings on each side of the sinc function. ffmpeg then sets a cutoff-frequency that is 0.95 times the output Nyquist Frequency, which should leave the sinc function with 7 zero-crossings, in case a 48kHz – 44.1kHz conversion is to be performed. In case a rational relationship cannot be found between the input sample-rate and the output sample-rate – which is often – the default behaviour is to use linear interpolation between sub-samples.

Much of this can be changed to non-default behaviour, but why change a good thing?


 

One of the facts which I had not mentioned much in earlier postings about sinc filters was, that it’s possible to design one, the corner-frequency of which is not always a negative power of two. Instead, I had mainly written about half-band filters. If every sinc filter was a half-band filter, and a sample-rate conversion was needed between two rates, that did not have a rational relationship, then I think it would be necessary to super-sample whichever sample-rate is the lower of the two. At that (doubled) sample-rate, a linear interpolation would take place, but then, cutting the frequency-response in half, would reduce the noise that results from that best.

However, if sinc filters are accepted that have corner-frequencies arbitrarily lower than 1/2 their input, Nyquist Frequency, then it should also be generally feasible just to super-sample the input sample-rate. But I do think that such filters require at least 4x super-sampling.

(Update 3/15/2019, 7h10 : )

An observation which I just made last night was, that such a sinc filter would need to have super-sampling of at least 4, but that the ‘ffmpeg’ command-line default for the filter size is only 32. What this implies is, that the sampling interval is much too short, to allow sharp frequency-cutoff. Therefore, some of the complaints that I read about the way ‘ffmpeg’ resamples, seem plausible. That way, the maximum number of zero-crossings on either side of the sinc function, if no actual conversion was taking place, and if the Resampling Phase-Shift was also zero, would be 4, and such a small number of zero-crossings will not allow a sinc filter to do its job properly. Now maybe, because the resampling phase-shift defaults to 10 and not zero, the situation is not quite as bad as it could be. But, since the filter size can simply be increased, my suggested command-line above will do so.

I just think that, when computing a series of coefficients, that corresponds to a sinc filter, the corner-frequency of which is not a negative power of two, and which has 4x super-sampling, care must be taken explicitly to make sure, that the series ends directly after a zero-crossing.


I suppose that the reader might also want to know what ‘Resampling Phase-Shift’ is all about. Specifically when designing sinc filters, the default assumption is, that phase-linearity is to be achieved, in spite of a brick-wall frequency cutoff. This contributes to the lack of immediacy in high-frequency sound pulses.

Depending on how the sinc-filter is implemented, this may also introduce a time-shift to the output stream.

A completely different issue with resampling in general is, that an output-sample may only be generated, as soon as the entire sampling interval is filled with input-values. This can result in fewer output-samples, than there were input samples. To help correct this situation, a type of phase-shift can also be defined. If this is the intent of the parameter, then it specifies how many zeroes to precede the first input audio-sample with, thus delaying the output.

OTOH, If the same assumption is applied to a row of pixels, belonging to a 2D image, then any virtual pixels must either be equal to the first real pixel they precede, or, equal to the last real pixel they follow.

Humans will not appreciate the playback of the highest audible frequencies most because the response curve is maximally rectangular before the frequency cutoff. Instead, Humans will tend to like the playback most, if pulses near that frequency are maximally short and immediate.

Well as long as the phase-response needs to be linear as well, the response to a pulse will also precede that pulse with ‘ringing’, whereas according to analog technology, a short pulse will be followed by ringing, and not preceded by it. Being preceded by ringing, makes the pulse not-immediate.

Even in the design of some sinc filters this problem may be remedied partially, just by repositioning the sinc function within its sampling interval, closer to the beginning of that interval. The onset will then be faster than the decay, and the subjective sound of high-frequency pulses will be more immediate. But the phase-response will no longer be linear.

‘ffmpeg’ has a setting (for that), and its default phase-shift corresponds to an integer command-line parameter of (10/30). I can only make a guess as to what the units are within which this setting is defined, but I basically just wrote two of my guesses above.


 

(Erratum 3/16/2019, 19h15 : )

Sometimes I make assumptions about how software works, that just aren’t consistent with how other people implemented it. And one example of that could be, my idea that at the edges of this sinc function, coefficients which surpass the last zero-crossing, should just be set to zero.

What the coders may have done is to apply a cutoff frequency that is (0.97) times the output Nyquist Frequency, just so that when down-sampling from 48kHz to 44.1kHz, with an interval of 32, the sinc function will have ~3.5 zero-crossings on each side.

The nice result from that would be, the full suppression of the Nyquist Frequency, of the input sampling rate. But then it could be the responsibility of a non-trivial choice for the phase-shift, to place the outermost zero-crossings as close as possible to the ends of the interval over which the filter operates. I.e., The phase-shift could be any non-zero even integer, that isn’t a multiple of 4. Thus, if the user simply wanted to improve the selectivity without increasing processing time, he could merely set this phase-shift to 2.

If that were the case, then it would still be true, that my doubling the filter-size leaves 7 zero-crossings on each side. But if I next specified a phase-shift of 16, this would just reposition the sinc function by another 3.5 zero-crossings. To avoid that trap it might then be better, to choose a phase-shift of 14.


 

(Update 3/21/2019, 15h05 : )

I suppose that I should add for emphasis, that my assumption so far has been, that a video stream is being used as source, the audio of which is sampled at 48kHz.

If the sample-rates are equal, then the default behaviour of ‘ffmpeg’ is not to resample, even if parameters were given on the command-line.

If instead the exercise was some sort of irrational up-conversion (to 22.05kHz), then next, I’d assume that ‘ffmpeg’ has been programmed sanely enough, to use sensible parameters by default. In that case, no special parameters should be used, ?

For all we know, the user could have a video to start with, that has audio sampled at 16kHz…

 


ffmpeg -i Sample.avi -vn -ar 22050 -ac 2 -acodec pcm_s16le Sample.wav
oggenc -q 8 -o Sample.ogg Sample.wav

 


 

(Update 3/23/2019, 11h40 : )

According to what I most recently read, ‘ffmpeg’ applies either a Kaiser Window, or a Blackman-Nutall Window, to the sinc function by default.

This completely takes care of the question, of what happens to non-zero weights belonging to the sinc function, at the edges of the sampling interval.

Additionally, the ‘-phase_shift’ parameter may not reposition the sinc function within this window, but may only reposition the output-stream, relatively to the input stream, making the trivial choice of a parameter harmless.

Because, as I have just found, ‘ffmpeg’ can resample its audio with one out of several filter-types, I think that explicit care needs to be taken, to make sure that a ‘cubic’ filter is not selected by default. Hence, I have revised the command-lines above, to select one of the filters that use the sinc function.

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.