Why some Linux devs still use spinlocks.

There is An interesting article, on another BB, which explains what some of the effects are, of spinlocks under Linux. But, given that at least under Linux, spinlocks are regarded as ‘bad code’ and constitute ‘a busy-wait loop’, I think that that posting does not explain the subject very well of why, in certain cases, they are still used.

A similar approach to acquiring an exclusive lock – that can exist as an abstract object, which does not really need to be linked to the resource that it’s designed to protect – could be programmed by a novice programmer, if there were not already a standard implementation somewhere. That other approach could read:

 

from time import sleep
import inspect
import thread

def acquire( obj ):
    assert inspect.isclass(obj)
    myID = thread.get_ident()
    if obj.owner is None:
        # Replace .owner with a unique attribute name.
        # In C, .owner needs to be declared volatile.
        # Compiler: Don't optimize out any reads.
        obj.owner = myID
    while obj.owner != myID:
        if obj.owner == 0:
            obj.owner = myID
        sleep(0.02)

def release( obj ):
    assert inspect.isclass(obj)
    if obj.owner is None:
        # In C, .owner needs to be declared volatile.
        obj.owner = 0
        return
    if obj.owner == thread.get_ident():
        obj.owner = 0

 

(Code updated 2/26/2020, 8h50…

I should add that, in the example above, Lines 8-12 will just happen to work under Python, because multi-threading under Python is in fact single-threaded, and governed by a Global Interpreter Lock. The semantics that take place between Lines 12 and 13 would break in the intended case, where this would just be pseudo-code, and where several clock cycles elapse, so that the ‘Truth’ which Line 13 tests may not last, just because another thread would have added a corresponding attribute on Line 12. OTOH, adding another sleep() statement is unnecessary, as those semantics are not available, outside Python.

In the same vein, if the above code is ported to C, then what matters is the fact that in the current thread, several clock-cycles elapse between Lines 14 and 15. Within those clock cycles, other threads could also read that obj.owner == 0, and assign themselves. Therefore, the only sleep() instruction is dual-purpose. Its duration might exceed the amount of time the cache was slowed down to, to execute multiple assignments to the same memory location. After that, one out of possibly several threads would have been the last, to assign themselves. And then, that thread would also be the one that breaks out of the loop.

However, there is more that could happen between Lines 14 and 15 above, than cache-inefficiency. An O/S scheduler could cause a context-switch, and the current thread could be out of action for some time. If that amount of time exceeds 20 milliseconds, then the current thread would assign itself afterwards, even though another thread has already passed the retest of Line 13, and assumes that it owns the Mutex. Therefore, better suggested pseudo-code is offered at the end of this posting…

)


 

This pseudo-code has another weakness. It assumes that every time the resource is not free, the program can afford to wait for more than 20 milliseconds, before re-attempting to acquire it. The problem can crop up, that the current process or thread must acquire the resource, within microseconds, or even within nanoseconds, after it has become free. And for such a short period of time, there is no way that the O/S can reschedule the current CPU core, to perform any useful amount of work on another process or thread. Therefore, in such cases, a busy-wait loop becomes The Alternative.

I suppose that another reason, for which some people have used spinlocks, is just due to bad code design.


 

Note: The subject has long been debated, of what the timer interrupt frequency should be. According to kernel v2.4 or earlier, it was 100Hz. According to kernel v2.6 and later, it has been 1000Hz. Therefore, in principle, an interval of 2 milliseconds could be inserted above (in case the resource had not become free). However, I don’t really think that doing so would change the nature of the problem.

Explanation: One of the (higher-priority, hardware) interrupt requests consists of nothing but a steady pulse-train, from a programmable oscillator. Even though the kernel can set its frequency over a wide range, this frequency is known not to be terribly accurate. Therefore, assuming that the machine has the software installed, that provides ‘strict kernel marshalling’, every time this interrupt fires, the system time is advanced by a floating-point number, that is also adjusted over a period of hours and days, so that the system time keeps in-sync with an external time reference. Under Debian Linux, the package which installs that is named ‘ntp’.

 

There exist a few other tricks to understand, about how, in practice, to force an Operating System which is not an RTOS, to behave like a Real-Time O/S. I’ve known a person who understood computers on a theoretical basis, and who had studied Real-Time Operating Systems. That’s a complicated subject. But, given the reality that his operating system was not a Real-Time O/S, he was somewhat stumped by why then, it was able to play a video-clip at 24 FPS…

(Updated on 3/12/2020, 13h20 …)

Continue reading Why some Linux devs still use spinlocks.

Multitasking

During this posting from some time ago, I wrote at length about the subject of interrupt prioritization. I wanted to demonstrate that I really did study System Software, and that therefore, I know something which I can pass along, about the subject of multitasking, which is a different subject.

The perspective from which I appreciate multitasking, is the perspective of the device-driver. According to what I was taught – and it is a bit dated – a device-driver had a top half and a bottom half. The top half was invoked by user-space processes, in a way that required a system-call, while the bottom half was invoked by interrupt requests, from the hardware. This latter detail did not change if instead of using purely interrupt-driven I/O, the hardware and driver used DMA.

A system-call is also known as a software-interrupt.

The assumption which is made is that a user-space process can request a resource, thus invoking the top half of the device driver, but that the resource is recognized by the device driver as being busy, and that processes on the CPU run much faster than the I/O device. What the device-driver will do is change the state of the process which invoked it from ‘running’ to ‘blocked’, and it will make an entry in a table it holds, which identifies which process had requested the resource, as well as device-specific information, regarding what exactly the process asked for. Then, the top half of the device-driver will send whatever requests to the device that are needed, to initiate the I/O operation, and to ensure that at some future point in time, the resource and piece of data which were asked for, will become available. The top half of the device-driver then exits, and the process which invoked it will typically not be in a ‘running’ state anymore, unless for some reason the exact item being requested was immediately available. So as far as the top half is concerned, what usually needs to happen next is that some other process, which is in the ‘ready’ state, needs to be made ‘running’ by the kernel.

The bottom half of the device driver responds to the interrupt request from the device, which has signaled that something has become available, and looks up in the table belonging to the device driver, which process asked for that item, out of several possible processes which may be waiting on the same device. The bottom half then changes the state of the process in question from ‘blocked’ to ‘ready’, so that whenever a ‘ready’ process is about to be made ‘running’, the previously-blocked process will have become a contender.

The O/S kernel is then free to schedule the ‘ready’ process in question, making it ‘running’.

Now, aside from the fact that processes can be ‘running’, ‘ready’, or ‘blocked’, a modern O/S has a differentiation, between ‘active’ and ‘suspended’, that applies to ‘ready’ and ‘blocked’ processes. My System Software course did not go into great detail about this, because in order to understand why this is needed, one also needs to understand that there exists virtual memory, and that processes can be swapped out. The System Software course I took, did not cover virtual memory, but I have read about this subject privately, after taking the course.

(Update 2/28/2020, 12h30 :

Another reason for which my course did not cover the subject about Active versus Suspended states, is probably the fact that the course still assumed that the Running process would use the (deprecated) ‘Yield()’ instruction, which was roughly equivalent to issuing ‘sleep(0.0)’. With this instruction, the state of a Running process could directly be made Ready-Active, with no need for Ready-Suspended. The instructors may well have thought, that much of the detail which follows below, would actually have been too much information to add to their course, which was focussed more on device-drivers.)

The kernel could have several reasons to suspend a process, and the programmer should expect that his process can be suspended at any time. But one main reason why it would be suspended in practice, would be so that it can be swapped out. Only the kernel can make a ‘suspended’ process ‘active’ again. (:1)

(Updated 2/27/2020, 22h50 … )

Continue reading Multitasking