Getting Pulseaudio to schedule real-time threads under Debian / Stretch and beyond.

On the computer which I name ‘Phosphene’, I have Debian 9 / Stretch installed, and, going the popular route, the Debian maintainers have decided that ‘Pulseaudio’ should be the main sound server, while in certain cases, it can be made to coexist with the ‘Jack’ / ‘Jack v2′ daemon, the latter for more demanding needs. In my case, starting Jack suspends Pulse.

In addition, Debian systems I’m familiar with run ‘pulseaudio’ as user, and not as root.

But, the way my Pulse daemon is configured, includes a ‘loopback’ module, that sends audio that has been input and sampled from the analog input of my (“Creative X-Fi”) sound card, back to its analog output, so that I can listen to an old-fashioned-but-modern radio receiver, over my speakers, without having to connect anything special to do so.

The problem that I was experiencing was, buffer underruns, and sound drop-outs, due to the way this default sound server was operating the sound card. And, surely enough, it had something to do with a by-now famous issue, of allowing Pulse to schedule certain threads to run with real-time priority. This can be enabled in the config file ‘/etc/pulse/daemon.conf‘, but fails to become actual when just enabled so.

Trying to find an answer to why this attempt was failing, sent me on a trip to many Web-sites and BB-postings, most of which had as disadvantage, the fact that with Pulse, there is an old way and a new way to accomplish this, and the fact that most of the BB-postings assumed the old way, being dated to the year 2016 and earlier.

Rest assured, under Debian 9 / Stretch, authorizing Pulse to schedule some threads with RT priority, is managed according to ‘the new way’. Just to recap how ‘the old way’ worked:

The file ‘/usr/bin/pulseaudio‘ needed to be set with ‘SETUID root’, and when run this way, if the sound server detected that the real user (not the effective user) belonged to the group ‘pulse-rt‘, would schedule the threads it needed to run with RT, and would then drop privileges, so that according to the process-monitor ‘top’, it would never even give a hint that anything had in fact been scheduled with RT.

This is how ‘the new way’ is supposed to work:

If the user needs for Pulse specifically to run with RT, he must actually install the package ‘rtkit‘, which runs as a system user, and which sets certain threads to have RT priority in a curated way, that’s meant to be ‘safer’, than it still is for any one user to belong to a group, that simply allows him or her to run arbitrary processes with RT. In fact, ‘the old way’ was also meant as a similar safeguard.

I had installed the specified package, rebooted my system – with Pulse given its new startup parameters, and found alas, that the file ‘/var/log/syslog‘, which was benefiting from an elevated logging level set by me, revealed numerous messages that I did not understand.

Trying to figure out what was happening resulted in more than one day’s frustration, and then finally, to some peace of mind…

(Updated 4/16/2020, 13h15 … )

Continue reading Getting Pulseaudio to schedule real-time threads under Debian / Stretch and beyond.

Why some Linux devs still use spinlocks.

There is An interesting article, on another BB, which explains what some of the effects are, of spinlocks under Linux. But, given that at least under Linux, spinlocks are regarded as ‘bad code’ and constitute ‘a busy-wait loop’, I think that that posting does not explain the subject very well of why, in certain cases, they are still used.

A similar approach to acquiring an exclusive lock – that can exist as an abstract object, which does not really need to be linked to the resource that it’s designed to protect – could be programmed by a novice programmer, if there were not already a standard implementation somewhere. That other approach could read:

 

from time import sleep
import inspect
import thread

def acquire( obj ):
    assert inspect.isclass(obj)
    myID = thread.get_ident()
    if obj.owner is None:
        # Replace .owner with a unique attribute name.
        # In C, .owner needs to be declared volatile.
        # Compiler: Don't optimize out any reads.
        obj.owner = myID
    while obj.owner != myID:
        if obj.owner == 0:
            obj.owner = myID
        sleep(0.02)

def release( obj ):
    assert inspect.isclass(obj)
    if obj.owner is None:
        # In C, .owner needs to be declared volatile.
        obj.owner = 0
        return
    if obj.owner == thread.get_ident():
        obj.owner = 0

 

(Code updated 2/26/2020, 8h50…

I should add that, in the example above, Lines 8-12 will just happen to work under Python, because multi-threading under Python is in fact single-threaded, and governed by a Global Interpreter Lock. The semantics that take place between Lines 12 and 13 would break in the intended case, where this would just be pseudo-code, and where several clock cycles elapse, so that the ‘Truth’ which Line 13 tests may not last, just because another thread would have added a corresponding attribute on Line 12. OTOH, adding another sleep() statement is unnecessary, as those semantics are not available, outside Python.

In the same vein, if the above code is ported to C, then what matters is the fact that in the current thread, several clock-cycles elapse between Lines 14 and 15. Within those clock cycles, other threads could also read that obj.owner == 0, and assign themselves. Therefore, the only sleep() instruction is dual-purpose. Its duration might exceed the amount of time the cache was slowed down to, to execute multiple assignments to the same memory location. After that, one out of possibly several threads would have been the last, to assign themselves. And then, that thread would also be the one that breaks out of the loop.

However, there is more that could happen between Lines 14 and 15 above, than cache-inefficiency. An O/S scheduler could cause a context-switch, and the current thread could be out of action for some time. If that amount of time exceeds 20 milliseconds, then the current thread would assign itself afterwards, even though another thread has already passed the retest of Line 13, and assumes that it owns the Mutex. Therefore, better suggested pseudo-code is offered at the end of this posting…

)


 

This pseudo-code has another weakness. It assumes that every time the resource is not free, the program can afford to wait for more than 20 milliseconds, before re-attempting to acquire it. The problem can crop up, that the current process or thread must acquire the resource, within microseconds, or even within nanoseconds, after it has become free. And for such a short period of time, there is no way that the O/S can reschedule the current CPU core, to perform any useful amount of work on another process or thread. Therefore, in such cases, a busy-wait loop becomes The Alternative.

I suppose that another reason, for which some people have used spinlocks, is just due to bad code design.


 

Note: The subject has long been debated, of what the timer interrupt frequency should be. According to kernel v2.4 or earlier, it was 100Hz. According to kernel v2.6 and later, it has been 1000Hz. Therefore, in principle, an interval of 2 milliseconds could be inserted above (in case the resource had not become free). However, I don’t really think that doing so would change the nature of the problem.

Explanation: One of the (higher-priority, hardware) interrupt requests consists of nothing but a steady pulse-train, from a programmable oscillator. Even though the kernel can set its frequency over a wide range, this frequency is known not to be terribly accurate. Therefore, assuming that the machine has the software installed, that provides ‘strict kernel marshalling’, every time this interrupt fires, the system time is advanced by a floating-point number, that is also adjusted over a period of hours and days, so that the system time keeps in-sync with an external time reference. Under Debian Linux, the package which installs that is named ‘ntp’.

 

There exist a few other tricks to understand, about how, in practice, to force an Operating System which is not an RTOS, to behave like a Real-Time O/S. I’ve known a person who understood computers on a theoretical basis, and who had studied Real-Time Operating Systems. That’s a complicated subject. But, given the reality that his operating system was not a Real-Time O/S, he was somewhat stumped by why then, it was able to play a video-clip at 24 FPS…

(Updated on 3/12/2020, 13h20 …)

Continue reading Why some Linux devs still use spinlocks.