When 16-bit / 44.1kHz Audio was first developed, it implied a very capable system for representing high-fidelity sound. But I think that today, we live in a pseudo-16-bit era. Manufacturers have taken 16-bit components, but designed devices which do bot deliver the full power or quality of what this format once promised.
It might be a bit of an exaggeration, but
I would say that out of those indicated 16 bits of precision, the last 4 are not accurate. And one main reason this has happened, is due to compressed sound. Admittedly, signal compression – which is often a euphemism for data reduction – is necessary in some areas of signal processing. But one reason fw data-reduction was applied to sound, had more to do with dialup-modems and their lack of signal-speed, and with the need to be able to download songs onto small amounts of HD space, than it served any other purpose, when the first forms of data-reduction were devised.
Even though compressed streams caused this, I would not say that the solution lies in getting rid of compressed streams. But I think that a necessary part of the solution would be consumer awareness.
If I tell people that I own a sound device, that it uses 2x over-sampling, but that I fear the interpolated samples are simply generated as a linear interpolation of the two adjacent, original samples, and if those people answer “So what? Can anybody hear the difference?” Then this is not an example of consumer awareness. I can hear the difference between very-high-pitch sounds that are approximately correct, and ones which are greatly distorted.
Also, if we were to accept for a moment that out of the indicated 16 bits, only the first 12 are accurate, but there exist sound experts who tell us that by dithering the least-significant bit, we can extend the dynamic range of this sound beyond 96db, then I do not really believe that those experts know any less about digital sound. Those experts have just remained so entirely surrounded by their high-end equipment, that they have not yet noticed the standards slip, in other parts of the world.
Also, I do not believe that the answer to this problem lies in consumers downloading 24-bit, 192kHz sound-files, because my assumption would again be, that only a few of those indicated 24 bits will be accurate. I do not believe Humans hear ultrasound. But I think that with great effort, we may be able to hear 15-18kHz sound from our actual playback devices again – in the not-so-distant future.
I do not believe that the answer itself lies in wireless headphones. I just happen to own a set, of which I know the filtering is better, than a linear interpolation between the oversampled audio-stream. If my headphones already have a wavelet stored in their chips, that enables them to receive Hi-Fi sound from my phone, it stands to reason that their manufacturer will have just used the same wavelet 1 more time, in service of the over-sampling.
My wireless headphones just might have 32-Ohm drivers, while the cheaper wired headphones may only have 8-Ohm drivers. I guess that helps too.
But most of all, I think that the experts who invented 16-bit sound, should be aware that their standard is no longer being referred to, when cheap products use 16-bit components. If those inventors are even still alive.
There is some possibility that Apple might be slightly better at this, than certain other companies. But this is not even a claim I can be certain of, let alone prove. If Apple has in fact been touting 192kHz sound, this just seems to suggest further separation from ‘the real world’ of cheap audio.
We can add a random displacement to the voltage being sampled, which barely spans one quantization unit, and which makes sure that the digital signal always has an amplitude of at least one quantization unit. This variance is to be spread over the entire spectrum, so that it resembles white noise.
If a smaller-amplitude signal was added to that, before quantization, then its signal-energy will be focused in some part of the spectrum. This means that after D/A reconversion, some concentration of signal-energy should be observable, in the part of the spectrum this smaller-amplitude signal once occupied, in analog form.
This is particularly easy to apply in software, if a recording has been stored say, in 24-bit format, but if it is being exported to 16-bit, because anywhere between 2 and 4 randomized bits after the first 16, can be added to the 24-bit signal, which are then rounded up.
If the listener is expecting to hear 20kHz sound-pulses well, the fact that typical audio is only sampled at 44.1kHz may get in his way. This is not because he can hear ultrasound. This is because the temporal resolution of those pulses, that have a carrier-frequency of 20kHz, is akin to amplitude-modulation.
The proximity of a 20kHz wave, to the Nyquist Frequency of 22.05kHz, prohibits it from being modulated more often than 2 050 times per second.
We do not actually hear ‘sound events’ at that high a rate. But whenever a vinyl record passed a scratch, its 20kHz pulse went to full intensity, within (1 / 10 000) of a second, or ~ 10 000 times per second.
What this seems to do is stimulate certain hairs in our cochlea more vividly, than a constant 20kHz wave would.
If that was the definition of the problem, then it has been solved, by increasing the sample-rate to 48kHz, because the Nyquist Frequency associated with that, is now 24kHz. So we should be able to hear our 20kHz pulse gain full amplitude within (1 / 8000) of a second, or flash on and off completely 4000 times per second.
Now, if a person instead thinks that he should hear a pulse with a center-frequency of 22kHz, not 20kHz, I guess we would be back to square one…
(Edit 03/09/2017 : One weakness with the outside-of-the-envelope estimations I have written here, is an assumption that the Nyquist Frequency itself is passed at full amplitude.
What a Sinc-Filter does instead, is pass the Nyquist Frequency, but only at 1/2 amplitude. Therefore, all these estimations about achievable Sound Immediacy, are overly-optimistic. Expect just a little less in each case. )
This subject also connects, with the question of what the ideal number of coefficients is, for the Sinc-Filter. If the intent was, that a 20kHz wave should not alias, even at a sample-rate of only 44.1kHz, then the filter needs a larger number of coefficients on each side of the center-point, such as maybe 6 or 7. But then doing so also wrecks the temporal resolution as much. One relevant observation becomes, that the listener may appreciate his high-frequency sound, but not be picky about whether that is 20kHz or merely 18kHz.
And, the programmer may actually want the user to resort to a 48kHz sample-rate, in order to deal with the highest frequencies properly, in which case only 5 coefficients on each side of the center-point seem more-correct for his filter.
Often, the thermal noise of the analog amplifier preceding the A/D converter, is already high enough in amplitude, to ensure that the overall signal level is at least one quantization unit. The exception lies in very high-quality amplifier stages.
If the dithering exceeds one quantization unit, it becomes detrimental to overall resolution and not helpful, because it can exceed that tiny peak in the smaller signal, in some localized part of the spectrum. And, if the dithering does not have uniform frequencies, as certain types of electronic noise sources do not, it can easily overpower this ( < 1 Quantization Unit ) of smaller signal, in its hotter parts of the spectrum.
Triangular (2-bit) could have 2 advantages over just-random (3-bit) :
- The temporal resolution could be better, since a triangle-wave is completed in shorter intervals,
- Its signal energy will be concentrated at the higher parts of the spectrum, where a presumed smaller signal is not.