During this posting from some time ago, I wrote at length about the subject of interrupt prioritization. I wanted to demonstrate that I really did study System Software, and that therefore, I know something which I can pass along, about the subject of multitasking, which is a different subject.
The perspective from which I appreciate multitasking, is the perspective of the device-driver. According to what I was taught – and it is a bit dated – a device-driver had a top half and a bottom half. The top half was invoked by user-space processes, in a way that required a system-call, while the bottom half was invoked by interrupt requests, from the hardware. This latter detail did not change if instead of using purely interrupt-driven I/O, the hardware and driver used DMA.
A system-call is also known as a software-interrupt.
The assumption which is made is that a user-space process can request a resource, thus invoking the top half of the device driver, but that the resource is recognized by the device driver as being busy, and that processes on the CPU run much faster than the I/O device. What the device-driver will do is change the state of the process which invoked it from ‘running’ to ‘blocked’, and it will make an entry in a table it holds, which identifies which process had requested the resource, as well as device-specific information, regarding what exactly the process asked for. Then, the top half of the device-driver will send whatever requests to the device that are needed, to initiate the I/O operation, and to ensure that at some future point in time, the resource and piece of data which were asked for, will become available. The top half of the device-driver then exits, and the process which invoked it will typically not be in a ‘running’ state anymore, unless for some reason the exact item being requested was immediately available. So as far as the top half is concerned, what usually needs to happen next is that some other process, which is in the ‘ready’ state, needs to be made ‘running’ by the kernel.
The bottom half of the device driver responds to the interrupt request from the device, which has signaled that something has become available, and looks up in the table belonging to the device driver, which process asked for that item, out of several possible processes which may be waiting on the same device. The bottom half then changes the state of the process in question from ‘blocked’ to ‘ready’, so that whenever a ‘ready’ process is about to be made ‘running’, the previously-blocked process will have become a contender.
The O/S kernel is then free to schedule the ‘ready’ process in question, making it ‘running’.
Now, aside from the fact that processes can be ‘running’, ‘ready’, or ‘blocked’, a modern O/S has a differentiation, between ‘active’ and ‘suspended’, that applies to ‘ready’
and ‘blocked’ processes. My System Software course did not go into great detail about this, because in order to understand why this is needed, one also needs to understand that there exists virtual memory, and that processes can be swapped out. The System Software course I took, did not cover virtual memory, but I have read about this subject privately, after taking the course.
(Update 2/28/2020, 12h30 :
Another reason for which my course did not cover the subject about Active versus Suspended states, is probably the fact that the course still assumed that the Running process would use the (deprecated) ‘Yield()’ instruction, which was roughly equivalent to issuing ‘sleep(0.0)’. With this instruction, the state of a Running process could directly be made Ready-Active, with no need for Ready-Suspended. The instructors may well have thought, that much of the detail which follows below, would actually have been too much information to add to their course, which was focussed more on device-drivers.)
The kernel could have several reasons to suspend a process, and the programmer should expect that his process can be suspended at any time. But one main reason why it would be suspended in practice, would be so that it can be swapped out. Only the kernel can make a ‘suspended’ process ‘active’ again. (:1)
(Updated 2/27/2020, 22h50 … )
(As of 2/11/2017 : )
In order to appreciate this detail, of ‘active’ as opposed to ‘suspended’, I suppose that the reader should also consider the
Sleep() command, with which a currently ‘running’ process surrenders the CPU. In the distant past, we used a
Yield() command, which was nonspecific about for how long the process was going to yield the CPU, but on any modern computer, the use of
Yield() is deprecated (not only when we are programming in Java).
The , which become parameters in its implementation. The first states for how many seconds the process wishes to yield the CPU, while the second states a much shorter time unit – typically milliseconds. A sensible question to ask would be, ‘Why would the System Software specialists not use a single parameter?’
Sleep() command takes two arguments
The reason for this is, that for shorter time-intervals, it makes no sense to swap out the process, while for longer time-intervals, it is feasible to do so. Thus, if the
Sleep() command has a non-zero value as its first, slower parameter, the process will be ‘suspended’ in practice, while otherwise, in order to become inactive for only some number of milliseconds,
it will remain a ‘ready-active’ process, in cue to be scheduled to ‘running’, but allowing other processes to run for the moment.
The way in which ‘ready-active’ processes are scheduled to ‘running’ on modern computers, has something to do with the fact that modern computers receive timing pulses as one of their hardware-interrupts. The interrupt service routine for a timing interrupt performs a miscellany of tasks, one of which is to look at ‘ready-active’ processes, and to analyze what their priority is, as well as
whether their sleep milliseconds have elapsed.
This priority is not an interrupt priority level, nor an interrupt vector, with which kernel-space processes run, but a software-priority level, with which a computer is multitasking.
What this means is that in reality, the amount of time for which a process sleeps is not exact, and will in fact exceed the value it specified, because it will not get scheduled, until according to a timing interrupt, the elapsed milliseconds exceed the specified
Sleep() parameter milliseconds. Programmers who need to write code that runs more-or-less in real-time need to take this into consideration. If there is more than one ‘ready-active’ process that fits this description, the process with the higher priority will be scheduled. But there could be two or more of them, that all have the same priority as well. And in this last case, there is an arbitrary order in which they form a cue. This order should at least allow all the processes to run eventually.
The use of Real-Time Operating Systems, has largely been replaced with this sort of logic, especially in cases where real-time scheduling is considered ‘preferable but not critical’. This includes cases in which users are just watching a video stream, and where the player-process can simply be made higher-priority, than processes which may run in the background, and do not run live. In fact, the terminology “Real-Time” has become distorted in consumer-grade operating systems, to refer to a single process, which will have a higher priority than all other processes, and which will be scheduled to ‘running’ within its requested time-interval, on that basis – delayed maximally by the frequency of the timing pulses.
The competition between ‘ready-active’ processes to become scheduled to ‘running’, is not a rescheduling event. Rescheduling involves possible ‘TLB Shootdowns’, as well as involving to swap processes out of physical RAM, and back in. The competition between ‘ready-active’ and ‘ready-suspended’ processes, does involve rescheduling, and therefore involves remapping virtual memory, and therefore also presents the most-efficient time at which they may be swapped in or out.
Technically, the remapping of virtual memory is called a ‘context-switch’, while actually to swap pages of memory in or out, is distinct. They may not go together, if what is being swapped are pages of data instead of code.
And data or code which has been swapped out, also represents a resource which is temporarily unavailable, when other code asks for it. All virtual data must at some point get swapped back into real, physical RAM, before it can be used. But the request for virtual memory additionally involves the ‘Translation Lookaside Buffer’, the ‘TLB’, and the way that works does not belong to the subject of this posting.
(Update 2/23/2020, 14h55 : )
In this posting, I was consciously using terminology a bit differently, from how the mainstream is using it by now. I had loosely stated that the changing of the state of a process, from Ready to Running, or from Running to Blocked, which actually accompanies that process’s either running or not, on any given CPU core, was not the same type of change in state, as one which will remap virtual memory, that latter change in state requiring more overhead.
According to today’s terminology, the former change in state is called ‘rescheduling’ anyway. Thus, to change its state from Ready to Running, even though virtual memory did not need to be remapped, is referred to today, as ‘To schedule that process’.
And of course, multiple threads are handled the same way in which multiple processes are, except that threads belonging to the same process share their Data and Instruction Segments but each possesses its own Stack Segment.
(Update 2/26/2020, 19h00 : )
There is a bit of a correction which I must make to what I wrote here. What I did study in System Hardware and System Software had two basic limitations, that are difficult for me to surmount, ‘just by contemplating’. Yet, when there are gaps in one’s knowledge, they lead to consistency issues, which leads to contemplation nonetheless.
- The ‘Sleep()’ function which once existed, that accepted two integer arguments, has been replaced with a different API function, actually named ‘sleep()’, and that accepts one floating-point number, the unit of which is the second. This has become cross-platform progress.
- My System Software course really failed to point out the difference between the Active / Suspended differentiation of the processes and threads’ state, where threads are really just processes that share a code segment as well as their data segment (but not their stack segment).
According to my contemplation, there should be another important reason, why processes end up Suspended: They executed the ‘sleep()’ function! If they did so, it would make most sense to me, if they next ended up in the Ready-Suspended state.
It is my present belief that, as the ‘sleep()’ functions’ time intervals expire, processes are migrated by the kernel, from Ready-Suspended to Ready-Active. But, along with any processes that end up Ready-Active from the Blocked-Active state, which would be the responsibility of the device drivers, I think they jump into a cue, that puts them directly behind any processes that have a higher (or an equal) priority. (:2) At the same time, if a process had asked the device driver for a resource which was busy, thus putting it in the Blocked-Active state, and if those processes were next migrated to Blocked-Suspended, and if the device driver next found that their requested resource has become ready for them, then the device driver will simply migrate them to Ready-Suspended.
To translate this into letters, which the Linux ‘top’ or ‘ps’ command will output:
- Ready-Suspended receives the letter ‘S’.
- Blocked-Active receives the letter ‘D’.
- Because Blocked processes are not running, they do not have the chance to execute the ‘sleep()’ function, thus, rarely being migrated to Blocked-Suspended in the real world.
This rather intricate hypothesis, of which I’m not 100% sure, poses an important question:
- When processes are Running, if they neither make a system call nor execute the ‘sleep()’ function, is there a way to get ‘evicted’, say, ‘Because a Ready-Active process has been waiting long enough to be made Running’? And the answer to this question, valid for Linux is,
- As soon as the process is in the Ready-Active state, IF its priority is higher than that of at least one actually-running process, that lower-priority, Running process is “preempted” directly, so that the new process becomes Running. In practice, this means that processes will mainly stay in the Ready-Active state, because their priority is lower than that of all Running processes.
- Additionally, each running process was given a ‘Time-Slice’ when made Running, and when that Time-Slice expires, it is also preempted (see link provided above). Accordingly, the other main reason a process would be in the Ready-Active cue, is the possibility of its Time-Slice being exhausted.
- A “Preemptive Scheduler” will preempt running processes, as part of the timer interrupt.
I think that this detail is really a way in which the ‘RT’ process priority is only an extension of the conventional ones. Within an ‘RTOS’, if a process that has ‘RT’ priority, and a ‘sleep()’ interval that is expired, it will not only be in the Ready-Active cue, but will actually preempt a Running process, in order that the ‘RT’ process’ ‘sleep()’ interval not be exceeded. In this sense, ‘RT’ priority is simply a priority level higher than any numbered one.
(Update 2/27/2020, 14h30 : )
Even though it might seem reasonable at first glance, to visualize cues with entries, the order of which can be changed, it is usually not necessary to implement a cue in such a fashion. Often, there exists a polling method, that will accomplish the same thing that ~a reordering cue~ would have accomplished. For example, if the three pathways by which a process can enter the Ready-Active state are:
- Because the scheduler preempted a Running process,
- Because a device-driver migrated a process from the Blocked-Active state, because a hardware feature which was once busy has signalled, via a hardware interrupt, that it sas become ready,
- Because the kernel has determined that the ‘sleep()’ interval of a process, in the Ready-Suspended state, has elapsed.
It can just as easily be implemented, that their actual order in the Ready-Active cue is left unchanged throughout. But then, a hypothetical type of polling logic could be applied which, if executed, does the following:
- For each idle CPU core, schedule the first non-exhausted Ready-Active process with the highest priority, if there is one, to Running on this core,
- Remember what the lowest ‘Priority’ of any Running process was, as well as the ‘ID’ of one process with that priority level, that is Running on the lowest CPU ‘core number’,
- Set an ‘ID’ to be Null,
- Scan the cue for the first non-exhausted process, which has A higher ‘Priority’ than what Step (2) above determined, And the priority of the previous process thus found in the cue. Hence, there should be one variable that holds, Either the ‘Priority’ from Step (2) above, Or the Priority of any found process in the cue from This Step, which was higher than what was stored in this same variable. Every time such a cued process is found, store its ‘ID’ in the variable set by Step (3) above,
- If the ‘ID’ from Step (3) or Step (4) above is not Null, preempt the Running process with the ID from Step (2) above, and schedule the process with the ‘ID’ from Step (4) above to become Running on the ‘core number’ from Step (2) above,
- Continue from Step (3) above until (No more processes have been scheduled to Running), in other words, until the ‘ID’ variable from Step (3) above remains Null.
- If there is still an idle CPU core, and there are only exhausted processes in the cue, give all the processes cued new Time-Slices, so that they are not exhausted anymore. …?
What the reader should be able to recognize, however, is, that if the logic which I wrote above, to resolve competition between processes that are Ready-Active, to become Running, both needs to be fast, and needs to follow synchronously, from the timer interrupt handler, then it would seem appropriate that logic which swaps process’s data back in, should not be made part of the same logic. That logic should follow before the kernel migrates processes to Ready-Active.
(Update 2/27/2020, 22h50 : )
The number of processes actually running, is absolutely limited by the number of CPU cores. Further, the number of processes that are Ready-Active (not Running), also needs to be kept reasonable, in relationship to how many cores there are. In order to do so, the kernel can Suspend threads belonging to processes. Yet, if that was the reason for which a thread was Suspended (externally), the kernel will need to set a time, just like the time that would be set by the ‘sleep()’ instruction from within a process.
According to an article which I read, Even though UNIX generically defines a ‘pthread_suspend()’ and a ‘pthread_resume_np()’ function, which would allow one user thread to suspend another user thread, Linux specifically never implemented them. This would also be why, under Linux, there is no such state as the ‘Blocked-Suspended’ state. It would arise if a thread were ‘Suspended’, which just happened to be ‘Blocked’ at the time. And this would also explain why the terminology under Linux does not refer to ‘Ready-Suspended‘ processes, instead just referring to them as ‘Sleeping’. Also, Linux does a few other things that generic UNIX would not do, such as to define a (Z)ombie and a s(T)opped state, where processes are killed by being zombied first. (Zombies must generally be reaped by their parent process). Yet, the world beyond Linux must also be understood, and it is only different to some degree. I think, a s(T)opped process is really just the Linux way of referring to a Ready-Suspended set of threads, that have no time-limit after which they would automatically be rescheduled to Ready-Active. And, they will seem to stem from the O/S.
What I also read was that Under Linux, there is a ‘pause()’ function. However, its main way of digressing from this posting is the fact that it must be called from within the thread being suspended, and, because a blocked thread cannot be executing code, again, the Blocked-Suspended state will not result.
The way in which pages of memory are swapped in, is in fact tied to how Virtual Memory works, as well as therefore, to how the TLB works, the last part of which I wrote, I would not go in to, in this posting.
Swapping a page in happens, because the CPU attempts to access its virtual address, either to read or to write, but because the physical location of that virtual address is not in RAM, instead being swapped out. What actually happens is, that the process causes a page-fault, which invokes a subroutine in the kernel (with low, system-call priority), the long-term effect of which is, to put the process into Blocked-Active state, and waiting for the resource – which in this case, is the page of memory to be in physical memory, but which also depends on an I/O operation to be performed. Thus, one good place where process’s memory pages are swapped back in, is while they are in the Blocked-Active state, before being migrated to Ready-Active.
What that should imply is, that the kernel may in fact migrate the process from Ready-Suspended to Ready-Active, with the understanding at programming time, that some of the data it will want to access may in fact be swapped out. And then, as soon as that process tries to access such data, it goes to Blocked-Active.
One question which I do not know the answer to is, ‘At what point is the decision actually made, to swap pages of memory out?’ The book that I read only describes what must happen, to swap them in. A page of memory could get swapped out, just in response to some other page being swapped in, but because there is limited physical memory available. But, just as it is with processes running on the CPU cores, pages of memory could also get swapped out preemptively, because the kernel detects a shortage of free physical memory.
I suppose then that, to swap out pages preemptively, should start from the Ready-Suspended processes, and potentially proceed, until the kernel panics, and the whole system just freezes. I think that the system will freeze, before swapping anything out preemptively, proceeds to the Ready-Active processes.
The kernel may only have one subroutine for swapping out pages of memory, that needs to be called preemptively. Yet, the kernel would then take into consideration how many pages have been requested to be swapped back in, when doing so. In that case, pages would never be swapped back in, unless there is enough physical memory available to do so.