One of the advantages to having a Linux computer is, the fact that users can find a wealth of detailed information about their hardware, without having to buy any add-on software. ‘Phoenix’, the PC which also acts as my Web-server, is a Debian 8.5 system, i.e. a Debian / Jessie, Linux system, I think well-suited to being my server. Yet, ‘Phoenix’ is old hardware, and used to exist as my computer ‘Thunderbox’, before I completely wiped it and resurrected it as ‘Phoenix’. This machine was bought in the year 2008. It is a dual-core, 64-bit system with 4GB of RAM, and with a top clock-speed of 2.6 GHz.
One subject we can educate ourselves about, is how interrupt controllers work today. Back when I was still studying, we had a notion about interrupt controllers which was quite simplistic, and which supported a maximum of 16 interrupt request lines, from 0 to 15. Today, that sort of configuration would be called ‘a legacy interrupt controller’, and is not really supported anymore. ( :2 ) Today, what we have is ‘APIC’, which stands for ‘Advanced Programmable Interrupt Controller’. And one of the things which a present-day Linux user can do, is view the complete history of interrupt requests, since the last boot. When I do this, this is what I get to see:
We see that the type of interrupt request named “IO-APIC-edge” is maximally available for IRQ lines 0-15, even though IRQ9 is already something different: “IO-APIC-fasteoi”. This ‘EOI’ thing needs some explaining.
First of all, my IRQ history looks this small, because even this computer is extremely old. The same view on the laptop named ‘Klystron’ reveals a more complex picture.
The way interrupt controllers work today, requires that Interrupt Service Routines end with an instruction named ‘EOI’, which simply stands for End Of Interrupt. This CPU instruction sends a message to the interrupt controller, which allows the controller to update its local register, that allows the interrupt controller to know separately from the CPU, what the current Interrupt Priority Level is. The interrupt controller needs to know this, because the Interrupt Priority Encoder is part of the interrupt controller, and is only supposed to pass Interrupt Requests to the CPU, which have a higher priority, than the current priority level. And the fact has not changed, that the lowest-numbered IRQs, are also the highest-priority.
In the days of legacy interrupt controllers, Interrupt Service Routines did not need to end with an EOI instruction, because no explicit message needed to be sent to the controller, to tell it that the current Interrupt Service Routine was done. This routine would just exit, and upon doing so, return as a subroutine to a level of control that was either a lower-priority -service routine, or that existed unpredictably in user-space, since interrupt requests can interrupt user processes as well as Interrupt Service Routines already running. When a physical interrupt request was processed, the first thing that would happen was that the instruction pointer would get pushed onto the stack, and then the microcode of the CPU would dereference the interrupt vector table with the IRQ number to be jumped in to. The dereferenced Interrupt Service Routine would get executed with the status register supervisory bit set, which could be different from how the status register was before, when running a user-space process.
By the time the return address was popped off the stack, the status register already had to be restored, so that a user-space process could resume, but not inherit the supervisory status that the Interrupt Service Routine was running with, in kernel-space.
Also, any Interrupt Service Routine needs to be coded, so that it will push whatever register onto the stack that it is going to use, and that near the end of its life, these registers are popped back off, in the reverse order they were first pushed, by the Interrupt Service Routine itself. An actual Return from Subroutine instruction, finally acts to pop the instruction pointer, which lands the CPU in the exact part of the process that was interrupted. ( :1 )
All of this basically did and still does require that the interrupt controller be able to filter interrupt requests, so that only ones with higher priority than the current priority, and which were not masked, would get translated into a physical signal from the controller to the CPU, to execute an Interrupt Service Routine.
An APIC is assumed to be separated from the core CPU more than a legacy interrupt controller was, in that an APIC does not have access to the status register, but still has to filter the IRQ signals from peripherals, towards the same goal. And so with the use of the APIC, the Interrupt Service Routine needs to send an EOI signal, so that the APIC can unwind its local copy, of the current interrupt level.
To help the APIC accomplish this, this component, which is not an intimate part of the CPU, additionally needs to have something called an “In-Service Register”, which has 1 bit set, for every interrupt level that is currently in the process of running, with the assumption that only the highest-priority of those is truly running, and that if more than 1 bit of the In-Service Register is set, the lower-priority Interrupt Service Routines have all been interrupted by the higher-priority ones.
Further, it is usually assumed that a given Interrupt Service Routine should not be interrupted by additional requests for the same one, until the present instance has returned. And this detail also needs to be assured in an explicit way…
The APIC needs to have its In-Service Register, precisely because it does not have access to any of the CPU registers, including the status register nor any interrupt mask which the CPU itself may be storing. If an explicit need occurs to mask one of the interrupt request numbers, and if those come after IRQ15, then this needs to be sent to the APIC explicitly, and in a way that is no longer fast.
Okay. But now, when I look at the interrupt history of my own machine, this is as part of a health-check. And one fact which I always see on ‘Phoenix’, is that there is a history of 1 official hardware error. Presently, ‘Phoenix’ has been running for 2 days. But this machine can frequently run for 10 days, or even for 30 days, before it either needs a reboot, or before it is struck with a power failure. Even if it has been running for 30 days, I still see that there has been 1 official hardware error.
What this tells me is that the error in question happens early in the boot process, and always so. During boot-up, a Linux system starts with all its interrupt request lines masked, and a tedious process starts of detecting hardware, loading drivers, and activating the Interrupt Service Routines associated with those drivers.
Apparently these are chaotic moments followed by ‘normal operating time’, during which there is not allowed to be any more error.
What I also make sure to note, is that the total number of Non-Maskable Interrupts exactly equal the number of Performance Monitoring Interrupts. This is reassuring.
And the rest is just fun stuff to look at.
1: ) If the reader assumes that the Interrupt Service Routine may simply unset the supervisory bit, directly before doing a Return from Subroutine, this is a casual error. The -service routine instance has no static way of knowing, whether it has just interrupted a user-space process, or another Interrupt Service Routine. And thus there is no static way to code, whether to unset the supervisory bit or not.
Along with that, If the reader assumes that the Interrupt Service Routine may simply push the status register at the beginning, then he has overlooked the fact that control is being handed to it with the supervisory bit set, regardless of whether it was set or unset prior to the current hardware interrupt taking place.
And so it would seem that the status register must be popped, but that it is not up to the -service routine, to push it, any more than it was up to the -service routine, to push the instruction pointer…
(Edit 02/10/2017 : I left it to the reader to figure out the following, but am now stating this logic explicitly:
Popping the status-register is the second-to-last thing a non-EOI Interrupt Service Routine does. After that, its protected code only executes a Return From Subroutine.
Probably, an EOI instruction integrates this.
The reader may observe that the following can happen, because the status register has been popped, but before the Return From Subroutine is executed. Doing so can drop the current priority level, below that of a waiting Interrupt Request. The waiting Interrupt Request can end up being executed, before the Return From Subroutine is, which forms the last instruction of the legacy-type Interrupt Service Routine, that is stored in protected memory.
This situation used to resolve itself, because after the waiting, higher-priority Interrupt Service Routine returned control to the one which is ‘almost finished’, the only thing the now-lower-priority subroutine would end up doing, is execute its immediately-following Return From Subroutine, one more time.
The other Interrupt Service Routine, that was waiting for the current one to finish, also popped the status register first, and executed a Return From Subroutine last. So briefly, control could be in the hands of a subroutine, which could also return control to a higher-priority one.
It was generally allowed to execute a Return From Subroutine, out of protected memory, without the supervisory bit set.
Now, because of the 8-bit architecture, the reader might expect that the APIC will allow for a lowest-priority interrupt level equal to 255. But I expect that this will never happen, and that the lowest-priority level ever possible will be 254.
The reason for this should be, that System Hardware specialists would like for the coding of system-calls – aka software-interrupt handlers – to end in a single EOI instruction, just as the hardware-interrupt handlers do. Normally, an EOI instruction also sends the described hardware-signal to the APIC. A return-priority of 255 would signal the exit of a system call, and should not result in any hardware-signal being sent to the physical APIC. )
(Edit 02/11/2017 : On closer inspection, if the instruction to pop the status register was in fact followed by a Return From Subroutine Instruction, this last instruction would simply not be executed, because the instruction before it, should pop the instruction-pointer automatically.
There are certain instructions which a CPU only allows the program to execute, if the supervisory bit is set. And obviously, to pop the status register would be one of them, since this instruction will also potentially change the value of the supervisory bit. )
(Edit 07/10/2016 : ) In comparison, this is a screen-shot, of what my GUI utility shows me, about the Interrupt Request history on my laptop ‘Klystron':
There is a new type of interrupt listed here, labeled ‘PCI-MSI’. This type of interrupt is known as a Message-Signaled Interrupt. MSI Interrupts were not covered by the System Hardware courses back when I was studying, and one reason for that is the fact that I am as old as a dinosaur. The idea seems to be, that a peripheral can send data in-line with the regular data it sends, but in such a way that the data signals the fact that an interrupt is being requested. A specific memory location / DMA address is involved, but apparently does not give greater detail about the interrupt being requested, other than to identify the interrupt.
In the above example, MSI interrupts assigned to vectors 24, 37 and 38 actually seem to have been used by existing hardware.
On the side: About EOI interrupts, I recall having read somewhere that those come in two flavors, EOI signals that state, which interrupt request number has finished, and EOI signals that do not state this information, but which are simply fed to the APIC as an EOI signal.
This subject has caused me much thought. If a CPU was always a single-core, then it would follow that indeed, the higher-priority interrupt request has always interrupted the lower-priority one. And then, if the APIC simply receives an EOI, all it needs to do is assume that the highest-priority interrupt has just finished, thus avoiding any ambiguity in which bit of the In-Service Register to reset.
But with a quad-core CPU, the fact needs to be acknowledged, that indeed up to 4 interrupt handlers could be running concurrently. And so, because Computer Science is very precise at eliminating ambiguities, it would seem that in such a case the CPU must send the type of EOI signal to the APIC, which states which interrupt request number has in fact finished.
In any case, if the APIC simply receives the kind of EOI that does not state an interrupt request number, then I suspect its behavior will be just to knock off the highest-priority entry, with the lowest interrupt request number, in its In-Service Register, come what may.
(Edit 07/10/2016 : ) Even though a modern CPU is massively complex, a reasonable assumption would be that Engineers try not to make it so complex, as to render it completely incomprehensible. And so ‘reasonable’ would also mean, that some simplification exists to the logic, even though the simplification is not ultimately unavoidable. It would be practically called-for.
There is a possible simplification to how APIC, EOI interrupts work, which I would like to elaborate on, even though I am not 100% sure it is in place.
For every interrupt vector / request number, the APIC may store more bits of data, but not for every invocation of the same interrupt handler. The first bit of data should tell it, whether to wait for an EOI signal to come back from the CPU or not, before resetting the In-Service Register bit. The reason this should exist, is the fact that some of the interrupts numbered from 0 to 15, are not EOI-based, rather being similar in logic to legacy interrupts. The coding of their handlers could in fact be the same, as for legacy interrupt controllers.
For those 16 IRQ lines, it would be up to the CPU to manage certain details, much as it was in the days I was studying.
For dual-core CPUs it seems plausible, that hardware on the CPU keep track of which core, if any, is processing an EOI-type interrupt request, and that the CPU hardware feed any overlapping requests for that type, to the same core. Non-EOI interrupt requests could then run concurrently, on the other core.
This notion of EOI would no longer be efficient, on a 16-core CPU.
My thinking suggests, that on ‘Klystron’, I have 1 core for each supported type of interrupt, plus 1 more, which means that if several interrupts are fired, some chance exists that more than 1 core would end up serving interrupts concurrently.
I will not assume that interrupt-types are mapped to cores statically, especially since my GUI tool states otherwise. But dynamically, some such mapping takes place, when one of the interrupt-types goes from being dormant to being active.
(Edit 02/10/2017 : One way in which the above logic needs to be extended, is by acknowledging that it only applies to hardware interrupts, not software interrupts. Simply looking at my ‘
gkrellm‘ widget shows me, that all my cores could be running in kernel-time at once. Further, while the legacy interrupt-mask was assumed to be shared by all cores, each core has its own status register, complete with its own interrupt priority level bits.
It seems casually possible, that most of the kernel-time is taken up by software-interrupts, especially since those do not involve the APIC.
So a more-correct way to say what I wrote above would be, that the maximum number of cores that multiple interrupt requests can reserve, would be one for the EOI-type, that does not state which interrupt request is exiting, plus one for each other type of hardware interrupt, plus any number of cores for software-interrupts.
I have just written that EOI-interrupts have higher priority than MSI-interrupts, so that the latter do not always need their own core.
And if we could only assume that legacy-interrupts do not conflict with EOI-interrupts – which they do – then we could get all 3 types of hardware-interrupts to run on one core.
Well, a single-core CPU also has only one status-register, and would thus be capable of exposing its current interrupt-priority level, so that a hypothetical, matching APIC would be able to suppress sending it lower-priority, EOI-interrupts. )
(Edit 07/11/2016 : ) There is another possible way, in which the type of EOI could have been implemented, which sends the interrupt request number to the APIC, which has just finished. If that type of EOI signal is enabled, it could be that all EOI signals must be of that type, but that otherwise, all EOI signals must be of the simpler type.
If this is the case, my GUI utility will not show me two types of EOI interrupts, since they would all be the same.
2: ) ( 07/10/2016 ) I suppose that one more digression I should make, would be to state, how a ‘Legacy Interrupt Controller’ used to work.
In the days of old, clunky ICs, there existed a type of circuit block which was simply called “an encoder”, without having to state priority levels, or having to manage interrupt requests. This was a logic circuit which accepted a number of parallel inputs, each of which could be high or low, such as maybe 16 Interrupt Request lines. Because 16 equals 2 to the power of 4, the output of an encoder would simply be the 4-bit binary number, of the highest-numbered or lowest-numbered input line that was logically high. In the case of interrupts, it is only useful to go by the lowest-numbered line, because it would also have the highest priority.
The 16 interrupt request lines, IRQ0 through IRQ15, would need to pass by a set of parallel AND gates first, which acted as an interrupt mask, so that the highest-priority IRQ would get encoded afterward, which had not been filtered out by this mask.
The resulting 4-bit number was presented to the CPU, where it was compared with the current priority level, which to the best of my understanding was stored in the status register. If the presented IRQ number was lower than that, the CPU would acknowledge the request.
A separate subject which I was taught at length, was a kind of ‘hand-shaking’ which took place between the controller and the peripheral, in which the latter would keep its ‘IRQ’ line high, and wait to be served, until the controller raised the corresponding, parallel ‘IACK’ line, to tell the peripheral that its time had come. After the IACK signal was received by the peripheral, it was free to lower its IRQ signal.
I suppose I should mention, that although a High condition usually signifies Logical True / Active, many real circuits have inverted Pins, where a Low condition actually indicates True / Active. I have been ignoring this fact throughout.
Legacy interrupt controllers only sent an IACK to the peripheral, after the CPU had sent a corresponding signal to the controller.
The actual handler in those days could prevent the same peripheral from re-triggering it before it was done once, just by masking the interrupt request it was reacting to, as its first action. Changes written to the 16-bit interrupt mask by the CPU, were also sent out to the controller, before the next instruction was executed. And before such a handler exited, it would again unmask its own IRQ.
Handlers could mask and unmask each other, if the need arose.
An important detail was, the fact that lower-priority IRQs would wait, until the higher-priority handlers had exited, at which point the status register was also restored, and either the current priority level number could become greater than the presented priority number of a waiting IRQ, or the CPU could have landed back in user-space, which meant that the waiting IRQ was suddenly higher in priority, and would thus get acknowledged in turn by the CPU.
(Edit 07/11/2016 : ) In order to simulate this fully, modern CPUs actually handle an exception, to track each IRQ that was previously refused, because on the CPU, the priority of the IRQ coming in was too low to be served. Hence, the “Deferred Error APIC Interrupts” entry above.
Well, one reason for which this type of legacy controller cannot be used anymore, is the fact that the clock speeds and propagation delays will prevent changes to the interrupt mask from being communicated out to the controller, fast enough to prevent the peripheral from triggering the same handler again… Another would be the need to map the IRQ to a single core – because each core has its own status register – for as long as its supervisory bit stays set.
And so with APIC, the In-Service Register accomplishes the same thing. Always. Also, APIC assumes that if an IRQ has passed its filters locally, the CPU will act on it, such that APIC can acknowledge the interrupt to the peripheral – in theory – without having to wait for an acknowledgment from the CPU first. But what might get registered on the CPU instead is the ‘Deferred APIC Error’.
What this means is, that if we have 2 types of interrupts supported, as I do on ‘Phoenix’, effectively there are 2 parts to the controller acting in parallel, one offering legacy-like operation, and the other offering EOI-type operation. IRQ0-IRQ15, when not set to EOI mode, can then still wait for an acknowledgment from the CPU, while all the IRQs operating in EOI mode, can proceed according to the norms of that mode.
The controller belonging to ‘Klystron’ would seem to have 3 separate parts, to offer its 3 types of interrupts.
If the CPU is to send the type of EOI signal to the APIC, that states which IRQ has just finished, then it follows that its CPU cores each have an 8-bit field in their status register, in place of the legacy 4-bit field, to state their current priority when in kernel-space, so that this information will not need to be programmed into the handler statically, but rather follow from the status register, of 1 core, when the EOI opcode is executed. But in that case, which is not even evidenced on ‘Klystron’, numerous cores could be responding to IRQs, as I just stated.
Oddly, according to legacy interrupt logic, IRQ15 trumps IRQ15, and IRQ0 trumps IRQ0. Hence, when the core was running a software interrupt, which still exists, its priority level was also 15, but any hardware interrupt would have beaten it. This was partly why the interrupt mask was so important.
With APIC, the In-Service Register explicitly differs from that, in that if IRQ9 is In-Service, further signals from IRQ9 are masked.
(Edit 07/11/2016 : ) If I can safely assume, that all EOI-type interrupts will be of higher priority than the ‘PCI-MSI’ interrupts, as it is shown above for ‘Klystron’, then it may follow, that the number of EOI interrupts the APIC can send to the CPU concurrently, equals the number of cores minus 1, since 1 core must always be ready for legacy-like interrupts, and since all the EOI interrupts can be processed then, over MSI interrupts already running.
This would mean that the APIC needs to keep track of the 3 highest-priority running interrupt numbers (i.e., the 3 lowest numbers) in the case of a quad-core, and that it must keep track of the 15 highest priority running interrupt numbers in the case of a 16-core, and still be able to avoid ‘Deferred Error APIC Interrupts’.
I.e., Even though the APIC must still keep track of a potentially higher number of interrupts with its In-Service Register, and even though 4 of them can end up running at one time, a new interrupt request would only be sent to the CPU then, If it ranked among the top 3, assuming the CPU was a quad-core.
If that were to be the 4th-ranking request just according to the APIC, there would exist the risk that a higher-priority job was already running on one of the cores, belonging to the non-EOI type, and that therefore, when this request arrived at the CPU, it could be lower-priority than all those already running, thereby resulting in the ‘Deferred APIC Error’ condition.
(Edit 07/12/2016 : ) Even though some readers may already know this, I should spell out that the Interrupt Priority Levels I have been writing about, have nothing to do with the priority levels of individual processes, which Windows users see in their Task Manager, and which Linux users can also view, using the famous ‘
Any multitasking operating system will juggle numerous processes, by moving them between the Running, Ready-Active, Ready-Suspended, Blocked-Active, and Blocked-Suspended states. On a single-core system, only one task can be Scheduled by the kernel, to be Running at one time. On a multi-core system, aka a Multi-Processing system, more than one task can be scheduled to Running at once.
To help determine which tasks will be scheduled to Running, if more than one of them is Ready-Active at once, they have a priority level managed by the kernel, as well as a second parameter to the sleep() function call, that states for how many milliseconds they should sleep. But all this applies to processes that run in user-space. Even processes that run as user ‘
root‘ are still only running in user-space, except that the kernel grants these more permissions, than it grants processes belonging to other users.
There exist threads which run in kernel-space, and which result from software interrupts – aka function-call interrupts – aka system calls – and from hardware interrupts. Those threads have a much more limited differentiation as far as priorities go, while they all have higher priority than user-space processes.
The priority of a hardware interrupt results in an Interrupt Service Routine to be launched, which has that priority level. When control is returned to a user-space process, that priority level, and any status register bits which might have recorded it, no longer have any meaning.
Threads which run in kernel-space have the supervisory bit set, while user-space processes do not.
In many cases, interrupt handlers do not cause a rescheduling, nor a context switch, the latter of which would change which processes are Ready-Active as opposed to Ready-Suspended, instead running more briefly as if they were just subroutine calls that eclipse a user-space process, on one core. Yet, there also exist interrupts which do cause a rescheduling, of user-space processes, which makes the system multi-tasking.
Also, more recent advances have introduced asynchronous system calls, which will fork from the user-space process that spawned them, and return control to the user-space process, even though this type of system call has not yet finished doing what it was meant to do.