An exercise at converting an arbitrary video clip into ASCII-art.

One of the throw-back activities in Computing, which has existed since the 1990s, was so-called ‘ASCII-Art’, in which regular characters represented an image.

When this form of Art is created by a Human, it can look quite nice. But, if a mere computer program is given a sequence of images to convert into characters in a batch-process, the results are usually inferior, because all the program will be able to do is, to translate each cell of the images to an ASCII character, the brightness of which is supposed to represent the original brightness of the cell of the image. The complex shape of the actual text characters is not taken into account – at least, by any programs I have access to – and will also interfere with the viewer’s ability to recognize the intended image, because those shapes will just represent some random ‘noise’ in the image, without which, merely to have been given grey-scale tiles would have probably made it easier for the viewer to recognize the image.

In spite of recognizing this, I have persevered, and converted an arbitrary video-clip of mine into ASCII-art, programmatically. The following is the link by which it can be viewed:

(Link within my Web-site.)

And Yes, the viewer would need to enable JavaScript from my site, in order to obtain an actual animation, because that is what advances the actual ‘iframe’.

 

(Updated 6/26/2021, 14h45… )

Continue reading An exercise at converting an arbitrary video clip into ASCII-art.

Sound Fonts: How something that I blogged, was still not 100% accurate.

Sometimes it can happen that, in order to explain a subject 100% accurately, would seem to require writing an almost endless amount of text, and that, with short blog postings, I’m limited to always posting a mere approximation of the subject. The following posting would be a good example:

(Link to an earlier posting.)

Its title clearly states, that there are exactly two types of interpolation-for-use-in-resampling (audio). After some thought, I realized that a third type of interpolation might exist, and that it might especially be useful for Sound Fonts.

According to my posting, the situation can exist, in which the relationship between the spacing of (interpolated) output samples and that of (Sound Font) input samples, is an irrational relationship, and then plausibly, the approach would be, to derive a polynomial’s actual coefficients from the input sample-values (which would be the result of one matrix multiplication), and compute the value of the resulting polynomial at (x = t), (0.0 <= t < 1.0).

But there is an alternative, which is, that the input samples could be up-sampled, arriving at a set of sub-samples with fixed positions, and then, every output sample could arise as a mere linear interpolation, between two sub-samples.

It would seem then, that a third way to interpolate is feasible, even when the spacing of output samples is irrational with respect to that of input samples.

Also, with Sound Fonts, the possibility presents itself, that the Sound Font itself could have been recorded professionally, at a sample rate considerably higher than 44.1kHz, such as maybe at 96kHz, just so that, if the Sound Font Player did rely on linear interpolation, doing so would not mess up the sound as much, as if the Sound Font itself had been recorded at 44.1kHz.

Further, specifically with Sound Font Players, the added problem presents itself, that the virtual instrument could get bent upward in pitch, even though its recording already had frequencies approaching the Nyquist Frequency, so that those frequencies could end up being pushed ‘higher than the output Nyquist Frequency’, thereby resulting in aliasing – i.e., getting reflected back down to lower frequencies – even though each output sample could have been interpolated super-finely by itself.

These are the main reasons why, as I did experience, to play a sampled sound at a very high, bent pitch, actually just results in ‘screeching’.

Yet, the Sound Font Player could again be coded cleverly, so that it would also sub-sample its output sample rate. I.e., if expected to play the virtual instrument at a sample rate of 44.1kHz, it could actually compute interpolated samples closer together than that, corresponding to 88.2kHz, and then the Sound Font Player could compute each ‘real output sample’ as the average between two ‘virtual, sub-sampled output samples’. This would effectively insert a low-pass filter, which would flatten the screeching that would result from frequencies higher than 22kHz, being reflected below 22kHz, and eventually, all the way back down to 0kHz. And admittedly, the type of (very simple) low-pass filter such an arrangement would imply, would be The Haar Wavelet again. :oops:

If you asked me what the best was, which a Soundblaster sound card from 1998 would have been able to do, I’d say, ‘Just compute each audio sample as a linear interpolation between two, Sound Font samples.’ Doing so would have required an added lookup into an address in (shared) RAM, a subtraction, a multiplication, and an addition. In fact, basing this speculation on my estimation of how much circuit-complexity such an early Soundblaster card just couldn’t have had, I’d say that those cards would need to have applied integer arithmetic, with a limited number of fractional bits – maybe 8 – to state which Sound Font sample-position, a given audio sample was being ‘read from’, ?  It would have been up to the driver, to approximate the integer fed to the hardware. And then, if that sound card was poorly designed, its logic might have stated, ‘Just truncate the Sound-Font sample-position being read from, to the nearest sample.’

In contrast, when Audio software is being programmed today, one of the first things the developer will insist on, is to apply floating-point numbers wherever possible…

Also, if a hypothetical, superior Sound Font Player did have as logic, ‘If the sample rate of the loaded Sound Font (< 80kHz), up-sample it 2x; if that sample rate is actually (< 40kHz), up-sample it 4x…’, just to simplify the logic to the point of making it plausible, this up-sampling would only take place once, when the Sound Font is actually being loaded into RAM. By contrast, the oversampling of the output of the virtual instrument, as well as the low-pass filter, would need to be applied in real-time… ‘If the output sample rate is (>= 80kHz), replace adjacent Haar Wavelets with overlapping Haar Wavelets.’

Food for thought.

Sincerely,
Dirk

How to compute the sine function, on a CPU with no FPU.

There exists a maxim in the publishing world, which is, ‘Publish or Perish.’ I guess it’s a good thing I’m not a publisher, then. In any case, it’s been a while since I posted anything, so I decided to share with the community some wisdom that existed in the early days of computing, and when I say that, it really means, ‘back in the early days’. This is something that might have been used on mini-computers, or, on the computers in certain special applications, before PCs as such existed.

A standard capability which should exist, is to compute a decently accurate sine function. And one of the most lame reasons could be, the fact that audio files have been encoded with an amplitude, but that a decoder, or speech synthesis chip, might only need to be able to play back a sine-wave, that has that encoded peak amplitude. However, it’s not always a given that any ‘CPU’ (“Central Processing Unit”) actually possesses an ‘FPU’ (a “Floating-Point Unit”). In such situations, programmers back-when devised a trick.

It’s already known, that a table of pre-computed sine functions could be made part of a program, numbering maybe 256, but that, if all a program did was, to look up sine values from such a table once, ridiculously poor accuracies would initially result. But it was also known that, as long as the interval of 1 sine-wave was from (zero) to (two-times-pi), the derivative of the sine function was the cosine function. So, the trick, really, was, to make not one lookup into the table, but at least two, one to fetch an approximate sine value, and the next, to fetch an approximate cosine value, the latter of which was supposedly the derivative of the sine value at the same point. What could be done was, that a fractional part of the parameter, between table entries, could be multiplied by this derivative, and the result also added to the sine value, thus yielding a closer approximation to the real sine value. (:3)

But, a question which readers might have about this next could be, ‘Why does Dirk not just look up two adjacent sine-values, subtract to get the delta, and then, multiply the fractional part by this delta?’ And the answer is, ‘Because one can not only apply the first derivative, but also the second derivative, by squaring the fractional part and halving it (:1), before multiplying the result from that, by the negative of the sine function!’ One obtains a section of a parabola, and results from a 256-element table, that are close to 16 bits accurate!

The source code can be found in my binaries folder, which is:

https://dirkmittler.homeip.net/binaries/

And, in that folder, the compressed files of interest would be, ‘IntSine.tar.gz’ and ‘IntSine.zip’. They are written in C. The variance that I get, from established values, in (16-bit) integer units squared, is “0.811416” “0.580644” (:2). Any variance lower than (1.0) should be considered ‘good’, since (±1) is actually the smallest-possible, per-result error.


(Updated 12/04/2020, 11h50… )

Continue reading How to compute the sine function, on a CPU with no FPU.

There can be curious gaps, in what some people understand.

One of the concepts which once dominated CGI was, that textures assigned to 3D models needed to include a “Normal-Map”, so that even early in the days of 3D gaming, textured surfaces would seem to have ‘bumps’, and these normal-maps were more significant, than displacement-maps – i.e., height- or depth-maps – because shaders were actually able to compute lighting subtleties more easily, using the normal-maps. But additionally, it was always quite common that ordinary 8x8x8 (R,G,B) texel-formats needed to store the normal-maps, just because images could more-easily be prepared and loaded with that pixel-format. (:1)

The old-fashioned way to code that was, that the 8-bit integer (128) was taken to symbolize (0.0), that (255) was taken to symbolize a maximally positive value, and that the integer (0) was decoded to (-1.0). The reason for this, AFAIK, was the use by the old graphics cards, of the 8-bit integer, as a binary fraction.

In the spirit of recreating that, and, because it’s sometimes still necessary to store an approximation of a normal-vector, using only 32 bits, the code has been offered as follows:

 


Out.Pos_Normal.w = dot(floor(normal * 127.5 + 127.5), float3(1 / 256.0, 1.0, 256.0));

float3 normal = frac(Pos_Normal.w * float3(1.0, 1 / 256.0, 1 / 65536.0)) * 2.0 - 1.0;

 

There’s an obvious problem with this backwards-emulation: It can’t seem to reproduce the value (0.0) for any of the elements of the normal-vector. And then, what some people do is, to throw their arms in the air, and to say: ‘This problem just can’t be solved!’ Well, what about:

 


//  Assumed:
normal = normalize(normal);

Out.Pos_Normal.w = dot(floor(normal * 127.0 + 128.5), float3(1 / 256.0, 1.0, 256.0));

 

A side effect of this will definitely be, that no uncompressed value belonging to the interval [-1.0 .. +1.0] will lead to a compressed series of 8 zeros.

Mind you, because of the way the resulting value was now decoded again, the question of whether zero can actually result, is not as easy to address. And one reason is the fact that, for all the elements except the first, additional bits after the first 8 fractional bits, have not been removed. But that’s just a problem owing to the one-line decoding that was suggested. That could be changed to:

 


float3 normal = floor(Pos_Normal.w * float3(256.0, 1.0, 1 / 256.0));
normal = frac(normal * (1 / 256.0)) * (256.0 / 127.0) - (128.0 / 127.0);

 

Suddenly, the impossible has become possible.

N.B.  I would not use the customized decoder, unless I was also sure, that the input floating-point value, came from my customized encoder. It can easily happen that the shader needs to work with texture images prepared by an external program, and then, because of the way their channel-values get normalized today, I might use this as the decoder:

 


float3 normal = texel.rgb * (255.0 / 128.0) - 1.0;

 

However, if I did, a texel-value of (128) would still be required, to result in a floating-point value of (0.0)

(Updated 5/10/2020, 19h00… )

Continue reading There can be curious gaps, in what some people understand.