During this posting from some time ago, I wrote at length about the subject of interrupt prioritization. I wanted to demonstrate that I really did study System Software, and that therefore, I know something which I can pass along, about the subject of multitasking, which is a different subject.
The perspective from which I appreciate multitasking, is the perspective of the device-driver. According to what I was taught – and it is a bit dated – a device-driver had a top half and a bottom half. The top half was invoked by user-space processes, in a way that required a system-call, while the bottom half was invoked by interrupt requests, from the hardware. This latter detail did not change if instead of using purely interrupt-driven I/O, the hardware and driver used DMA.
A system-call is also known as a software-interrupt.
The assumption which is made is that a user-space process can request a resource, thus invoking the top half of the device driver, but that the resource is recognized by the device driver as being busy, and that processes on the CPU run much faster than the I/O device. What the device-driver will do is change the state of the process which invoked it from ‘running’ to ‘blocked’, and it will make an entry in a table it holds, which identifies which process had requested the resource, as well as device-specific information, regarding what exactly the process asked for. Then, the top half of the device-driver will send whatever requests to the device that are needed, to initiate the I/O operation, and to ensure that at some future point in time, the resource and piece of data which were asked for, will become available. The top half of the device-driver then exits, and the process which invoked it will typically not be in a ‘running’ state anymore, unless for some reason the exact item being requested was immediately available. So as far as the top half is concerned, what usually needs to happen next is that some other process, which is in the ‘ready’ state, needs to be made ‘running’ by the kernel.
The bottom half of the device driver responds to the interrupt request from the device, which has signaled that something has become available, and looks up in the table belonging to the device driver, which process asked for that item, out of several possible processes which may be waiting on the same device. The bottom half then changes the state of the process in question from ‘blocked’ to ‘ready’, so that whenever a ‘ready’ process is about to be made ‘running’, the previously-blocked process will have become a contender.
The O/S kernel is then free to schedule the ‘ready’ process in question, making it ‘running’.
Under Linux specifically, when access to devices is requested by user-space processes, through device-files in the directory ‘/dev/*’, this is also an equivalent that calls the top half of a device driver.
Now, aside from the fact that processes can be ‘running’, ‘ready’, or ‘blocked’, a modern O/S has a differentiation, between ‘active’ and ‘suspended’, that applies to ‘ready’
and ‘blocked’ processes. My System Software course did not go into great detail about this, because in order to understand why this is needed, one also needs to understand that there exists virtual memory, and that processes can be swapped out. The System Software course I took, did not cover virtual memory, but I have read about this subject privately, after taking the course.
(Update 2/28/2020, 12h30 :
Another reason for which my course did not cover the subject about Active versus Suspended states, is probably the fact that the course still assumed that the Running process would use the (deprecated) ‘Yield()’ instruction, which was roughly equivalent to issuing ‘sleep(0.0)’. With this instruction, the state of a Running process could directly be made Ready-Active, with no need for Ready-Suspended. The instructors may well have thought, that much of the detail which follows below, would actually have been too much information to add to their course, which was focussed more on device-drivers.)
The kernel could have several reasons to suspend a process, and the programmer should expect that his process can be suspended at any time. But one main reason why it would be suspended in practice, would be so that it can be swapped out. Only the kernel can make a ‘suspended’ process ‘active’ again. (:1)
(Updated 4/16/2020, 8h30 … )
(As of 2/11/2017 : )
In order to appreciate this detail, of ‘active’ as opposed to ‘suspended’, I suppose that the reader should also consider the
Sleep() command, with which a currently ‘running’ process surrenders the CPU. In the distant past, we used a
Yield() command, which was nonspecific about for how long the process was going to yield the CPU, but on any modern computer, the use of
Yield() is deprecated (not only when we are programming in Java). (:3)
The , which become parameters in its implementation. The first states for how many seconds the process wishes to yield the CPU, while the second states a much shorter time unit – typically milliseconds. A sensible question to ask would be, ‘Why would the System Software specialists not use a single parameter?’
Sleep() command takes two arguments
The reason for this is, that for shorter time-intervals, it makes no sense to swap out the process, while for longer time-intervals, it is feasible to do so
The way in which ‘ready-active’ processes are scheduled to ‘running’ on modern computers, has something to do with the fact that modern computers receive timing pulses as one of their hardware-interrupts. The interrupt service routine for a timing interrupt performs a miscellany of tasks, one of which is to look at ‘ready-active’ processes, and to schedule them to ‘running’
This priority of processes and threads is not an interrupt priority level, nor an interrupt vector, with which kernel-space processes run, but a software-priority level, with which a computer is multitasking.
What this means is that in reality, the amount of time for which a process sleeps is not exact, and will in fact exceed the value it specified, because it will not get scheduled, until according to a timing interrupt, the elapsed milliseconds exceed the specified
Sleep() parameter milliseconds. Programmers who need to write code that runs ‘more-or-less in real-time’ need to take this into consideration. If there is more than one ‘ready-active’ process that fits this description, the process with the higher priority will be scheduled. But there could be two or more of them, that all have the same priority as well. And in this last case, they form a cue, with the process most-recently migrated from ‘ready-suspended’ to ‘ready-active’ placed later than any processes already in this cue, that have an equal or higher priority. This order should at least allow all the processes to run eventually.
The use of Real-Time Operating Systems, has largely been replaced with this sort of logic, especially in cases where real-time scheduling is considered ‘preferable but not critical’. This includes cases in which users are just watching a video stream, and where the player-process can simply be made higher-priority, than processes which may run in the background, and do not run live. In fact, the terminology “Real-Time” has become distorted in consumer-grade operating systems, to refer to a single process, which will have a higher priority than all other processes, and which will be scheduled to ‘running’ within its requested time-interval, on that basis.
According to a true RT O/S, if the process has requested a ‘sleep()’ instruction, it’s guaranteed to be made Running again, within the time specified.
The competition between ‘ready-active’ processes to become scheduled to ‘running’, is not an event involving possible ‘TLB Shootdowns’. However, the competition between ‘ready-active’ and ‘ready-suspended’ processes, does involve TLB Shootdowns, and therefore involves remapping virtual memory, and therefore also presents the most-efficient time at which they may be swapped out.
Technically, the remapping of virtual memory is called a ‘context-switch’, while actually to swap pages of memory in or out, is distinct. They may not go together, if what is being swapped are pages of data instead of code.
And data or code which has been swapped out, also represents a resource which is temporarily unavailable, when other code asks for it. All virtual data must at some point get swapped back into real, physical RAM, before it can be used. But the request for virtual memory additionally involves the ‘Translation Lookaside Buffer’, the ‘TLB’, and the way that works does not belong to the subject of this posting.
(Update 2/23/2020, 14h55 : )
In this posting, I was consciously using terminology a bit differently, from how the mainstream is using it by now. I had loosely stated that the changing of the state of a process, from Ready to Running, or from Running to Blocked, which actually accompanies that process’s either running or not, on any given CPU core, was not the same type of change in state, as one which will remap virtual memory, that latter change in state requiring more overhead.
According to today’s terminology, the former change in state is called ‘rescheduling’ anyway. Thus, to change its state from Ready to Running, even though virtual memory did not need to be remapped, is referred to today, as ‘To schedule that process’.
And of course, multiple threads are handled the same way in which multiple processes are, except that threads belonging to the same process share their Data and Instruction Segments but each possesses its own Stack Segment.
(Update 4/16/2020, 6h35 : )
There is a bit of a correction which I must make to what I wrote here. What I did study in System Hardware and System Software had two basic limitations, that are difficult for me to surmount, ‘just by contemplating’. Yet, when there are gaps in one’s knowledge, they lead to consistency issues, which leads to contemplation nonetheless.
- The ‘Sleep()’ function which once existed, that accepted two integer arguments, has been replaced with a different API function, actually named ‘sleep()’, and that accepts one floating-point number, the unit of which is the second. This has become cross-platform progress.
- My System Software course really failed to point out the difference between the Active / Suspended differentiation of the processes and threads’ state, where threads are really just processes that share a code segment as well as their data segment (but not their stack segment).
According to my contemplation, there should be another important reason, why processes end up Suspended: They executed the ‘sleep()’ function! If they did so, it would make most sense to me, if they next ended up in the Ready-Suspended state.
It is my present belief that, as the ‘sleep()’ functions’ time intervals expire, processes are migrated by the kernel, from Ready-Suspended to Ready-Active. But, along with any processes that end up Ready-Active from the Blocked-Active state, which would be the responsibility of the device drivers, I think they jump into a cue, that puts them later than any processes that have a higher (or an equal) priority. (:2) If a process had asked the device driver for a resource which was busy, thus putting it in the Blocked-Active state, and if those processes were next migrated to Blocked-Suspended, and if the device driver next found that their requested resource has become ready for them, then the device driver will simply migrate them to Ready-Suspended.
To translate this into letters, which the Linux ‘top’ or ‘ps’ command will output:
- Ready-Suspended receives the letter ‘S’.
- Blocked-Active receives the letter ‘D’.
- Because Blocked processes are not running, they do not have the chance to execute the ‘sleep()’ function, thus, rarely being migrated to Blocked-Suspended in the real world.
This scheme poses an important question:
- When processes are Running, if they neither make a system call nor execute the ‘sleep()’ function, is there a way to get ~evicted~, say, ‘Because a Ready-Active process has been waiting long enough to be made Running’? And the answer to this question, valid for Linux is,
- If the process is at the front of the Ready-Active cue, during a timer interrupt, and IF one of the Running processes’ time-slice has expired, that Running process is “preempted”, so that the new process becomes Running.
- A “Preemptive Scheduler” will preempt running processes, as part of the timer interrupt, as soon as their time-slices have expired, placing them at the latest end of the cue.
I think that this detail is really a way in which the ‘RT’ process priority is only an extension of the conventional ones. If a process has an ‘RT’ priority, and a ‘sleep()’ interval that is expired, it will actually jump to the front of that cue, because any ‘RT’ priority level is also higher, than any regular priority level.
However, under Linux specifically, there is a difference between the RT scheduling states ‘SCHED_FIFO’ and ‘SCHED_RR’, the latter of which stands for ‘Round-Robbin’. Processes with ‘SCHED_FIFO’ do not have time-slices that expire, while processes with ‘SCHED_RR’ do.
(Update 4/16/2020, 0h30 : )
(Update 2/27/2020, 22h50 : )
The number of processes actually running, is absolutely limited by the number of CPU cores. Further, the number of processes that are Ready-Active (not Running), also needs to be kept reasonable, in relationship to how many cores there are. In order to do so, the kernel can Suspend threads belonging to processes. Yet, if that was the reason for which a thread was Suspended (externally), the kernel will need to set a time, just like the time that would be set by the ‘sleep()’ instruction from within a process.
According to an article which I read, Even though UNIX generically defines a ‘pthread_suspend()’ and a ‘pthread_resume_np()’ function, which would allow one user thread to suspend another user thread, Linux specifically never implemented them. This would also be why, under Linux, there is no such state as the ‘Blocked-Suspended’ state. It would arise if a thread were ‘Suspended’, which just happened to be ‘Blocked’ at the time. And this would also explain why the terminology under Linux does not refer to ‘Ready-Suspended‘ processes, instead just referring to them as ‘Sleeping’. Also, Linux does a few other things that generic UNIX would not do, such as to define a (Z)ombie and a s(T)opped state (Zombies must generally be reaped by their parent process). Yet, the world beyond Linux must also be understood, and it is only different to some degree. I think, a s(T)opped process is really just the Linux way of referring to a Ready-Suspended set of threads, that have no time-limit after which they would automatically be rescheduled to Ready-Active. And, they will seem to stem from the O/S.
What I also read was that Under Linux, there is a ‘pause()’ function. However, its main way of digressing from this posting is the fact that it must be called from within the thread being suspended, and, because a blocked thread cannot be executing code, again, the Blocked-Suspended state will not result.
The way in which pages of memory are swapped in, is in fact tied to how Virtual Memory works, as well as therefore, to how the TLB works, the last part of which I wrote, I would not go in to, in this posting.
Swapping a page in happens, because the CPU attempts to access its virtual address, either to read or to write, but because the physical location of that virtual address is not in RAM, instead being swapped out. What actually happens is, that the process causes a page-fault, which invokes a subroutine in the kernel (with low, system-call priority), the long-term effect of which is, to put the process into Blocked-Active state, and waiting for the resource – which in this case, is the page of memory to be in physical memory, but which also depends on an I/O operation to be performed.
What that should imply is, that the kernel may in fact migrate the process from Ready-Suspended to Ready-Active, with the understanding at programming time, that some of the data it will want to access may in fact be swapped out. And then, as soon as that process tries to access such data, it goes to Blocked-Active.
One question which I do not know the answer to is, ‘At what point is the decision actually made, to swap pages of memory out?’ The book that I read only describes what must happen, to swap them in. A page of memory could get swapped out, just in response to some other page being swapped in, but because there is limited physical memory available. But, just as it is with processes running on the CPU cores, pages of memory could also get swapped out preemptively, because the kernel detects a shortage of free physical memory.
I suppose then that, to swap out pages preemptively, should start from the Ready-Suspended processes, and potentially proceed, until the kernel panics, and the whole system just freezes. I think that the system will freeze, before swapping anything out preemptively, proceeds to the Ready-Active processes.
The kernel may only have one subroutine for swapping out pages of memory, that needs to be called preemptively. Yet, the kernel would then take into consideration how many pages have been requested to be swapped back in, when doing so. In that case, pages would never be swapped back in, unless there is enough physical memory available to do so.
(Update 4/16/2020, 8h30 : )
To my surprise, a modern-day equivalent of the ‘Yield()’ instruction still gets used, mainly, by processes scheduled with real-time priority, either as ‘SCHED_RR’ or as ‘SCHED_FIFO’. This function is called ‘sched_yield()’. When a process calls this function, it gets placed at the latest end of the cue.
Making this function available is sensible, because if a process with RT priority was instead, just to call ‘sleep(0.0)’, it would put itself into the ‘ready-suspended’ state, from where it would get put directly back to the front of the ‘ready-active’ cue, because RT priority is always higher than regular priority levels. And then, this process would just get scheduled to ‘running’ again, without really having given other processes in the ‘ready-active’ cue a chance to run.