Understanding ‘Meltdown’ and ‘Spectre’ in Layman’s Terms

One of the pieces of news which many people have heard recently, but which few people fully understand, is that in Intel chip-sets in particular, but also to a lesser degree in AMD-chip-sets, and even with some ARM (Android) chip-sets, a vulnerability has been discovered by researchers, which comes in two flavors: ‘Meltdown’ and ‘Spectre’. What do these vulnerabilities do?

Well, modern CPUs have a feature which enables them to execute multiple CPU-instructions concurrently. I learned about how this works, when I was taking a System Hardware course some time ago. What happens is meant to make up for the fact that to execute one CISC-Chip instruction, typically takes up considerably more than 1 clock-cycle. So what a CISC-Chip CPU does, is to start execution on instruction 1, but during the very next clock-cycle, to fetch the opcode belonging to instruction 2 already. Instruction 1 is at that point in the 2nd clock-cycle of its own execution. And one clock-cycle later, Opcode 3 gets fetched by the CPU, while instruction 2 is in the 2nd clock-cycle, and instruction 1 is in the 3rd clock-cycle – if their is any – of each of their execution.

This pushes the CISC-Chip CPUs closer to the ideal goal of executing 1 instruction per clock-cycle, even though that ideal is never fully reached. But, because CPU-instructions contain branches, where a condition is tested first, and where, in a roundabout way, if the non-default outcome of this test happens to be true, the program ‘branches off’, to another part within the same program, according to the true logic of the CPU-instructions. The behavior of the CPU under those conditions has also been made more-concurrent, than a first-glance appraisal of the logic might permit.

When a modern CISC-Chip CPU reaches a branching instruction in the program, it will continue to fetch opcodes, and to execute the instructions which immediately follow the conditional test, according to the default assumption of what the outcome of the conditional test is likely to be. But if the test brings about a non-default logical result, which will cause the program to continue in some completely different part within its code, the work which has already been done on the partially-executed instructions is to be discarded, in a way that is not supposed to affect the logical outcome, because program flow will continue at the new address within its code. At that moment, the execution of code no longer benefits from concurrency.

This concurrent execution, of the instructions that immediately follow a conditional test, is called “Speculative Execution”.

The problem is, that Complex Instruction-Set CPUs, are in fact extremely complex in their logic, as well as the fact that their logic has been burned as such, into the transistors – into the hardware – of the CPU itself, and even the highly-skilled Engineers who design CPUs, are not perfect. So we’ve been astounded by how reliably and faithfully actual, physical CPUs execute their intricate logic, apparently without error. But now, for the first time in a long time, an error has been discovered, and it seems to take place across a wide span of CPU-types.

This error has to do with the fact that modern CPUs are multi-featured, and that in addition to having concurrent execution, they also possess Protected Memory, as well as Virtual Memory. Apparently, cleverly-crafted code can exploit Speculative Execution, together with how Virtual Memory works, in order in fact to bypass the feature which is known as Protected Memory.

It is not the assumption of modern computers, that even when a program has been made to run on your computer – or your smart-phone – It would just be allowed ‘to do anything’. Instead, Protected Memory is a feature that blocks user-space programs from accessing memory that does not belong to them. It’s part of the security framework actually built-in to the hardware, that makes up the CPU.

More importantly, user-space programs are never supposed to be able to access kernel memory.

(Updated 01/11/2018 : )

Continue reading Understanding ‘Meltdown’ and ‘Spectre’ in Layman’s Terms

Hybrid Encryption

If the reader is the sort of person who sometimes sends emails to multiple recipients, and who uses the public key of each recipient, thereby actively using encryption, he may be wondering why it’s possible for him to specify more than one encryption key, for the same mass-mailing.

The reason this happens, is a practice called Hybrid Encryption. If the basis for encryption was only RSA, let’s say with a 2048-bit modulus, then one problem which should become apparent immediately, is that not all possible 2048-bit blocks of cleartext can be encrypted, because even if we assume that the highest bit of the modulus was a 1, many less-significant bits would be zeroes, which means that eventually a 2048-bit block will arise, that exceeds the modulus. And at that point, the value that’s mathematically meaningful only within the modulus will get wrapped around. As soon as we try to encode the number (m) in the modulus of (m), what we obtain is (zero), in the modulus of (m).

But we know that strong, symmetrical encryption techniques exist, which may only have 256-bit blocks of data, which have 256-bit keys, and which are at least as strong as a 2048-bit RSA key-pair.

What gets applied in emails is, that the sender generates a 256-bit symmetrical encryption key – which does not need to be the highest-quality random-number, BTW – and that only this encryption key is encrypted, using multiple recipients’ public keys, once per recipient, but that the body of the email is only encrypted once, using the symmetrical key. Each recipient can then use his private key, to decrypt the symmetrical key, and can then decrypt the message, using the symmetrical key.

This way, it is also easy to recognize whether the decryption was successful or not, because if the private key used was incorrect, a 2048- or a 2047-bit binary number would result, and the correct decryption is supposed to reveal a 256-bit key, prepended by another 1792 zeroes. I think that if most crypto-software recognizes the correct number of leading zeroes, the software will assume that what it has obtained is a correct symmetrical key, which could also be called a temporary, or per-message key.

Now, the reader might think that this subject is relevant to nothing else, but quite to the contrary. A similar scheme exists in many other contexts, such as SSL and Bluetooth Encryption, by which complex algorithms such as RSA are being used, to generate a temporary, or per-session key, or to generate a per-pairing key, which can then be applied in a consistent way by a Bluetooth Chip, if that’s the kind of Bluetooth Chip that speeds up communication by performing the encryption of the actual stream by itself.

What all this means is that even if hardware-encryption is being used, the actual I/O chip is only applying the per-session or per-pairing key to the data-stream, so that the chip can have logic circuits which only ‘know’ or implement one strong, symmetrical encryption algorithm. The way in which this temporary key is generated, could be made complicated to the n-th degree, even using RSA if we like. But then this Math, to generate one temporary or per-pairing key, will still take place on the CPU, and not on the I/O chip.

(Corrected 10/05/2018, 13h45 … )

Continue reading Hybrid Encryption

Why a Hard-Boot is often Overkill.

In order to understand this posting, the reader first needs to understand something, about how the (volatile) memory in a computer is organized. Through the use of virtual addresses, it gets organized into user-space and kernel-space. User-space includes programs that run with elevated privileges, but also the GUI-programs that display widgets and the like on the screen. Kernel-space is reserved for the kernel of the O/S, as well as for kernel-modules, that act as device drivers in practice.

But in addition to that, modern architectures also have I/O chips – which the kernel modules would be responsible for – that run “firmware”. These more-complex I/O chips will often perform complex tasks, such as the encryption built-in to Bluetooth, without requiring that these tasks run on the CPU, or in RAM. In order to do that, I/O chips capable of doing so need a smaller program of instructions, that actually run on the I/O chip. This program is typically no longer than 1-2 KB.

(Edit 06/09/2017 :

I suppose the fact should also be acknowledged, that in order for the firmware actually to do anything, the I/O chip should also have a somewhat larger region of memory – call it a Buffer – which stores data and not code. In the case of an intelligent Bluetooth chip, that buffer would logically store whatever encryption keys are currently being applied to data, which is being streamed to and from the chip… )

So in practice, a situation which the user can run in to, is that by unpairing all his (external) Bluetooth devices, and then turning Bluetooth Off from the settings GUI, and next turning BT back On, he can Fail to Reset the Bluetooth system to its original state. The user only gets to see what the GUI is showing him – which is being controlled by programs running in user-space.

What the user may not realize, is that the way kernel-space is often organized, turning Bluetooth Off in user-space, will fail to unload the kernel-module itself, responsible for operating the Bluetooth Chip, that has firmware running on it, for the sake of argument.

The actual kernel-modules first load when the computer boots up, and the first thing they do, if they are of the sort that use firmware, is to load that firmware onto the I/O chip – and it’s often patches to the firmware that get loaded, because the default firmware is burned-in to the I/O chip.

Continue reading Why a Hard-Boot is often Overkill.