One of the pieces of news which many people have heard recently, but which few people fully understand, is that in Intel chip-sets in particular, but also to a lesser degree in AMD-chip-sets, and even with some ARM (Android) chip-sets, a vulnerability has been discovered by researchers, which comes in two flavors: ‘Meltdown’ and ‘Spectre’. What do these vulnerabilities do?
Well, modern CPUs have a feature which enables them to execute multiple CPU-instructions concurrently. I learned about how this works, when I was taking a System Hardware course some time ago. What happens is meant to make up for the fact that to execute one CISC-Chip instruction, typically takes up considerably more than 1 clock-cycle. So what a CISC-Chip CPU does, is to start execution on instruction 1, but during the very next clock-cycle, to fetch the opcode belonging to instruction 2 already. Instruction 1 is at that point in the 2nd clock-cycle of its own execution. And one clock-cycle later, Opcode 3 gets fetched by the CPU, while instruction 2 is in the 2nd clock-cycle, and instruction 1 is in the 3rd clock-cycle – if their is any – of each of their execution.
This pushes the CISC-Chip CPUs closer to the ideal goal of executing 1 instruction per clock-cycle, even though that ideal is never fully reached. But, because CPU-instructions contain branches, where a condition is tested first, and where, in a roundabout way, if the non-default outcome of this test happens to be true, the program ‘branches off’, to another part within the same program, according to the true logic of the CPU-instructions. The behavior of the CPU under those conditions has also been made more-concurrent, than a first-glance appraisal of the logic might permit.
When a modern CISC-Chip CPU reaches a branching instruction in the program, it will continue to fetch opcodes, and to execute the instructions which immediately follow the conditional test, according to the default assumption of what the outcome of the conditional test is likely to be. But if the test brings about a non-default logical result, which will cause the program to continue in some completely different part within its code, the work which has already been done on the partially-executed instructions is to be discarded, in a way that is not supposed to affect the logical outcome, because program flow will continue at the new address within its code. At that moment, the execution of code no longer benefits from concurrency.
This concurrent execution, of the instructions that immediately follow a conditional test, is called “Speculative Execution”.
The problem is, that Complex Instruction-Set CPUs, are in fact extremely complex in their logic, as well as the fact that their logic has been burned as such, into the transistors – into the hardware – of the CPU itself, and even the highly-skilled Engineers who design CPUs, are not perfect. So we’ve been astounded by how reliably and faithfully actual, physical CPUs execute their intricate logic, apparently without error. But now, for the first time in a long time, an error has been discovered, and it seems to take place across a wide span of CPU-types.
This error has to do with the fact that modern CPUs are multi-featured, and that in addition to having concurrent execution, they also possess Protected Memory, as well as Virtual Memory. Apparently, cleverly-crafted code can exploit Speculative Execution,
together with how Virtual Memory works, in order in fact to bypass the feature which is known as Protected Memory.
It is not the assumption of modern computers, that even when a program has been made to run on your computer – or your smart-phone – It would just be allowed ‘to do anything’. Instead, Protected Memory is a feature that blocks user-space programs from accessing memory that does not belong to them. It’s part of the security framework actually built-in to the hardware, that makes up the CPU.
More importantly, user-space programs are never supposed to be able to access kernel memory.
(Updated 01/11/2018 : )
Our O/S Kernel consists of CPU-instructions, just like user-space programs, but it resides in a part of our RAM – our System Memory – which the kernel has reserved for itself. The kernel effectively polices everything the user-space programs do. If a user-space program was able to interfere with kernel-space addresses of RAM, it would ultimately be able to defeat every aspect of application-level security which an O/S offers.
( … Content removed 01/07/2018 Because Erroneous … )
(Edit 01/07/2018 : )
The way protected memory works may be integrated with how the data-cache works. This would mean that if the protection-bit of a line in the cache is not set for any reason, during normal operation, the current process may read from that line, or write data to it. If the process writes data to the line of the cache, that line is also marked ‘dirty’, which is to say that its contents are no longer in sync with the contents of actual RAM.
At some later point in time, a replacement will take place, when that line of the cache is needed by some other process – or by other code belonging to the same process. When that happens and the line has been marked ‘dirty’, the O/S kernel no longer keeps track of why it was marked dirty. And so, its changed contents would also be written back to real RAM as part of the replacement operation, before data from the new range of RAM addresses, associated with the given line in the cache, is read-in to it.
One detail which my own education was sketchy about, is whether, during normal operation, memory-protection has 1 bit per page of memory, or 1 bit per line of cache. Likewise, I’m sketchy on whether memory-protection is normally supposed to come between the cache and ‘real RAM’, or whether it gets implemented within the cache…
If memory-protection was meant to be implemented ‘downstream from the cache’, this would imply that in order to switch a core from running one process to running another, the kernel would need to flush the cache each time, after changing the protection bits. And doing so would cost time. But, the capability of code belonging to the kernel to do so, would follow from the fact that kernel-code runs with the supervisory bit set. The supervisory bit set, allows a CPU-core to ignore various security measures, such as protected memory. And unsetting this supervisory bit is literally the last thing the kernel-code does, before handing control over the CPU-core back to a user-space process.
- ‘Meltdown’ is a part of the vulnerability, by which a user-space process can gain access to running kernel processes’ data.
- ‘Spectre’ is a part of the vulnerability, by which a malicious program can gain access to other, running, user-space processes’ data.
As it turns out, only Intel CPUs are vulnerable to ‘Meltdown’, but a wider variety of CPUs, including AMDs and certain Android CPUs that have become CISC-machines, are actually vulnerable to ‘Spectre’.
And one of the really serious implications of all this, is that a proper, long-term fix may take some time to put into place. That might require that CPUs be redesigned, and then new computers would need to have those CPUs on their motherboards, because after all, all the logic of the CPUs is hardware – it can’t by default be reprogrammed. Some CPUs today may actually have firmware, but this is not the default assumption. ( :2 )
So advisories have been issued on how O/S programmers are to “Mitigate” the vulnerabilities. That means that even if the advisories are followed and applied, the core of the vulnerability will still exist at the hardware-level, but the way the kernel has been reprogrammed, will make it harder for malicious, user-space programs, to exploit these vulnerabilities in any way, that can amount to an attack.
In fact, in certain cases the hardware-optimizations may need to be disabled, specifically when System Calls are made, in other words, whenever a user-space program is asking the kernel to do anything for it. And this may actually slow down how the computer generally works, because those optimizations have contributed to why, our computers have been as fast as they’ve been, in recent decades. ( :1 )
- This vulnerability was found by researchers, apparently before having been found by hackers, and
- Just as it’s taking time for the O/S programmers to roll out patched kernels, it would take time for hackers to roll out exploits.
- Even though this affects what user-space programs can do, which they are not supposed to be able to do, it does not help the hacker to get his malware running on our computers. Hackers would still need to resort to Viruses, Trojans, Droppers etc., as they’ve been doing, in order to get potential user-space programs to run on the targeted computers, so that these programs could then exploit ‘Meltdown’ or ‘Spectre’.
In other words, with some luck, it could take a while for a real hacker to exploit these vulnerabilities, which also gives time for kernel-crackers – i.e., O/S programmers – to roll out the patches.
Google has the patch in its January 5 Security Patch Level. But many other O/S programmers may take just a bit longer with this.
Programmers will need to understand the advisories, so that they can modify the source-code of kernels, then they’ll need to compile those modifications successfully, then they may have to test the resulting kernels, to make sure that bugs are not published, and then the programmers will be able to upload the packages, which contain the patches, so that end-users like me will find the new packages as part of the routine way we maintain our software.
1: ) Even though I’ve taken 1 System Software and 1 System Hardware course, my level of knowledge is not deep enough, to know exactly where these vulnerabilities become actually-exploitable. From what I’ve read,
it has something to do with how the “Page Global Directory” gets traversed, as well as with how the TLB – the “Translation Lookaside Buffer” – is updated, so that addresses which reside in the TLB will effectively point to incorrect physical addresses.
(Edit 01/07/2018 :
The way the vulnerabilities manifest themselves is in the fact that even though the Speculative Execution of code rewinds, to the branch where the condition decided that already-processed instructions are not to be completed, data which those instructions read from their assigned memory-regions, remains in the data-caches of the CPU. On some CPUs, Speculative Execution can extend to processes with the supervisory bit set – i.e., can pass through a system call. Further, in order for a malicious program to be able to read data from those lines in the cache, the former state of memory-protection bits must also have been ‘forgotten’ somewhere.
After that, other methods can be employed by malicious programs, to determine the addresses of data left in the cache, and ultimately, to access that data.
Hence, it would seem that the way to trigger the problem is either to perform:
- Unless (True) – Make a System Call, or
- Unless (True) – ‘Sleep( 0, 1 )’
Either way, a chunk of data will remain in the CPU’s data-caches, the addresses of which the hacker does not control, and which are therefore of dubious value to a hacker. ) ( :3 )
Now, there’s one aspect to how Debian works – and maybe to how certain other Linux distributions work – that could confuse some Linux-users. We may have a meta-package installed, which is named something like:
And, thinking half-asleep, we might conclude ‘This bug mainly affects Intel chip-sets. Because the package-name itself suggests an AMD chip-set, it might not receive an update.’
But in reality, this mental slip has as its root, that the ‘-amd64′ designation only implies, that standard, 64-bit architecture is being implied. This package was fully expected to be installed on 64-bit Intel chip-sets as well.
The designation is only supposed to distinguish, that the binary was not meant for 32-bit 586 or 686 CPUs, nor for CPUs with PAE (“Physical Address Extension” – aka ‘big-mem’) , nor for ARM CPUs, etc..
(Edit 01/07/2018 :
In spite of this designation, the kernel-images are supposed to get an update, in response to ‘Spectre’. OTOH, It seems that The ‘Meltdown’ vulnerability has already been “fixed”, in the kernel-version designed for Debian / Stretch, according to the ‘Security’ Repository. However, further reading on my part suggests, that This fix will only become effective, If users also install an Intel-Microcode Update. I am uncertain whether the microcode-update required, will be the existing one, or an upcoming one. )
2: ) I suppose that I should also explain, what “Microcode” is. Certain CPUs are so complex, that their ability to perform their tasks is not left entirely up to the transistor-logic, acting in parallel. Instead, CPUs exist which have microcode.
What this means is that on those CPUs, a machine-language instruction that seems atomic to an application-programmer, or to the kernel-coding, or within an assembler-program, may exist as though it was itself a subroutine, defined by the microcode of the CPU.
This roughly parallels how certain other, large chips on the motherboard may also have firmware.
Firmware, or microcode, is like a computer program that executes on the chip it enables, rather than residing in RAM for the CPU itself to execute.
In fact, the existence of microcoded CPUs on mainframe computers, actually preceded the use of firmware on Input / Output devices historically.
Whether the CPU uses upgradable microcode or not, is specific to each CPU. Thus, if we’re lucky enough to possess a CPU that uses this, then what the microcode consists of will again be specific to one CPU, and the ‘Meltdown’ or ‘Spectre’ vulnerabilities may actually be repairable just through a change to the microcode.
When chips use firmware, they usually ship with a hard-coded firmware-version, as well as a tiny amount of memory, into which updates to the firmware can be loaded by the kernel, at boot-time. And If we happen to be using Linux, then the availability of some firmware is not open-sourced, but rather comes from the chip-manufacturer, in the form of what Linux-programmers call ‘blobs’. A blob is an arbitrary piece of binary data, which the Linux-programmer cannot even analyze.
( … )
But Linux System Programmers need to code for the large number of cases, in which the CPUs do not have upgradable microcode, that became popular in the earlier years of personal computing. This means in reality, that kernel-programmers need to release patches to the kernel for those cases, even if the microcode for one specific CPU can also be patched.
So I guess that In some cases, the behavior of the CPU can be corrected. I just have no way of knowing, whether the CPUs in my computers are such cases. I know that older PC, CPUs do not have upgradable microcode, so that their H/W behavior cannot be corrected.
3: ) One question which an astute reader might have about my loose description, according to which the vulnerability may be triggered with a ‘Sleep( 0, 1)’ instruction, would concern the fact that generally, a ‘Sleep()’ instruction of any kind, might only return control to the calling process after some time, and after data belonging to other processes has been forgotten by the H/W cache. But there is an aspect, to how I think ‘Sleep()’ works, which suggests maybe otherwise.
The version of this function which I seem to remember, is fed two parameters, the first of which tells the O/S for how many seconds, and the second of which tells the O/S for how many milliseconds, the current process is to sleep.
It is not as if the use of two parameters was just redundant, and as if the same request could be made with one parameter.
The reason for this is the fact that multitasking can be of two essential types:
- Non-Rescheduling, or
This is due to how the O/S manages the cues of processes, as being either Running, Ready-Active, or Ready-Suspended. Rescheduling (in my terminology) is the management of processes as becoming Ready-Suspended as opposed to Ready-Active. All the processes in the Ready-Active cue are in principal ready to run on very short notice, because they are already residing in physical memory.
I.e., I distinguish between processes taking turns at running on a CPU-core, that all remain loaded in physical memory while doing so, and processes which are not supposed to run for a second or more, and which may therefore first be Suspended, and then swapped out of physical memory. The latter type must first be swapped back in, and then Resumed by the kernel, before they can become Ready-Active again, and finally before they can be Running again.
If the first parameter to ‘Sleep()’ is a zero, then I assume the request to be non-rescheduling. And in that case, every effort has been made in the design of the O/S kernel, to make this instruction as fast as possible – if possible, almost as fast as a subroutine-call. And so there is some reason to think, that ‘Sleep( 0, 1 )’ may actually leave something in data-cache, that did belong to another user-space process. But, because this (speculative) system call would be ‘retired’ / ‘rewound’ due to the context of this posting, the other process itself never ran.
When considering this latter topic, I actually need to get out my old textbook which I studied from in my System Software course, because in reality, there also exist the states Blocked-Active and Blocked-Suspended. But reassuringly, my textbook confirms, that whether a process is Blocked or not, is a behavior of a device driver, where the process asked for input or output, and where the device was not ready. As soon as an I/O -device or -datum becomes ready, the ‘bottom half’ of the device-driver is supposed to change the state of the process that asked for the I/O, from Blocked to Ready.
OTOH, If the operating system is deciding to perform a context-switch, or to swap out processes, it will pick from the Suspended cues.
(Edit 01/10/2018 :
When a security expert submits a vulnerability, he actually needs to supply “proof-of-concept”, to prove that the vulnerability might actually be exploited by a hacker – which the security expert is assumed not to be.
And the way in which the proof of concept worked, for ‘Meltdown’, was that the malicious program doesn’t just make one speculative system call, but rather that the attacker will repeat this hundreds of times. Each time, a supposed attacker would sleep his code for a certain number of milliseconds – first for 1 millisecond, then for 2 milliseconds, then for 3, etc.. And each time, the malicious program would verify the contents left behind in the cache, so that in principle, the memory could be mapped out, that exists ‘on the other side of the security barrier’.
The reason fw this will not appear in the kernel-logs, is the fact that according to certain logic, the speculative execution did not take place. Apparently the Engineers who designed the CPU felt safe executing the speculative code, simply because in the end, the speculative execution ‘did not take place’. But in order to take place hypothetically, that execution needed to pull certain data into the cache, which the malicious program had no business reading.
What programming experts tend to do these days, is jargonize all that. “Indirect Branching” refers to a type of branch, the target-address of which is not written in the code, but where the CPU first fetches an address from an indexed table, and then branches to that address.
One detail which I’m not clear on myself would be, the initial system call was hypothetical, but it was followed by an instruction to sleep for a small number of milliseconds. Why, after the current process resumes running, is the data still in the cache, which the process stole? I suppose that the chip-designers got very confident with their ability to do all this, thinking it was safe to do so.
I should add, that some type of ‘stop condition’ must have been defined in CPUs, which would have been a logical situation under which the speculative execution doesn’t continue. But depending on how enthusiastic the Engineers of the CPU were, that may only come very late. For example, that might just happen once a machine-instruction has told the CPU, to write a piece of data to RAM – since each module within the CPU, could theoretically have its own versions of numerous registers. And such register-versions may be more than one per core.
But then, it’s my own interpretation of the differences between ‘Meltdown’ and ‘Spectre’, that maybe AMD CPUs were microcoded more-prudently?
(Edit 01/11/2017 :
If the kernel-crackers are working on ‘KPTI’, which stands for Kernel Page-Table Isolation, I suppose the first question the reader should be asking himself, is ‘What is a page-table?’
The page-tables that are stored in kernel-memory, form a hierarchy of tables, from which indexed values can be retrieved, starting from the first address in the Page Global Directory, which is at the top of the hierarchy.
They form a complete map of the virtual memory of the computer, and it’s from page-table entries, that the Translation Lookaside Buffer – the TLB – is given values by the kernel, every time a process asks for a piece of virtual memory, for which the TLB doesn’t have a page-frame stored.
A type of repeating pattern which plays out on CPUs that have virtual memory, is that addresses as seen by the CPU cores are virtual addresses. And somewhere between the data-cache and ‘real RAM’, there exists a type of cache called a Translation Lookaside Buffer. What it does is store the real, physical address in RAM, which follows from a virtual address. And a big problem with the TLB is, that it can only store the page-frames of a small number of pages of memory.
If the CPU asks the TLB for a virtual address, which is not currently loaded, a “Page Fault” results, which suspends the process that made the request. And during the page fault, the kernel looks up the page frame for the virtual address being sought.
These page frames are stored in a hierarchy of page tables. Actually, the highest-level table is called the Page Global Directory, and its entries point to Page Middle Directories, which finally point to Page Tables, that contain physical addresses / page-frames.