Understanding ‘Meltdown’ and ‘Spectre’ in Layman’s Terms

One of the pieces of news which many people have heard recently, but which few people fully understand, is that in Intel chip-sets in particular, but also to a lesser degree in AMD-chip-sets, and even with some ARM (Android) chip-sets, a vulnerability has been discovered by researchers, which comes in two flavors: ‘Meltdown’ and ‘Spectre’. What do these vulnerabilities do?

Well, modern CPUs have a feature which enables them to execute multiple CPU-instructions concurrently. I learned about how this works, when I was taking a System Hardware course some time ago. What happens is meant to make up for the fact that to execute one CISC-Chip instruction, typically takes up considerably more than 1 clock-cycle. So what a CISC-Chip CPU does, is to start execution on instruction 1, but during the very next clock-cycle, to fetch the opcode belonging to instruction 2 already. Instruction 1 is at that point in the 2nd clock-cycle of its own execution. And one clock-cycle later, Opcode 3 gets fetched by the CPU, while instruction 2 is in the 2nd clock-cycle, and instruction 1 is in the 3rd clock-cycle – if their is any – of each of their execution.

This pushes the CISC-Chip CPUs closer to the ideal goal of executing 1 instruction per clock-cycle, even though that ideal is never fully reached. But, because CPU-instructions contain branches, where a condition is tested first, and where, in a roundabout way, if the non-default outcome of this test happens to be true, the program ‘branches off’, to another part within the same program, according to the true logic of the CPU-instructions. The behavior of the CPU under those conditions has also been made more-concurrent, than a first-glance appraisal of the logic might permit.

When a modern CISC-Chip CPU reaches a branching instruction in the program, it will continue to fetch opcodes, and to execute the instructions which immediately follow the conditional test, according to the default assumption of what the outcome of the conditional test is likely to be. But if the test brings about a non-default logical result, which will cause the program to continue in some completely different part within its code, the work which has already been done on the partially-executed instructions is to be discarded, in a way that is not supposed to affect the logical outcome, because program flow will continue at the new address within its code. At that moment, the execution of code no longer benefits from concurrency.

This concurrent execution, of the instructions that immediately follow a conditional test, is called “Speculative Execution”.

The problem is, that Complex Instruction-Set CPUs, are in fact extremely complex in their logic, as well as the fact that their logic has been burned as such, into the transistors – into the hardware – of the CPU itself, and even the highly-skilled Engineers who design CPUs, are not perfect. So we’ve been astounded by how reliably and faithfully actual, physical CPUs execute their intricate logic, apparently without error. But now, for the first time in a long time, an error has been discovered, and it seems to take place across a wide span of CPU-types.

This error has to do with the fact that modern CPUs are multi-featured, and that in addition to having concurrent execution, they also possess Protected Memory, as well as Virtual Memory. Apparently, cleverly-crafted code can exploit Speculative Execution, together with how Virtual Memory works, in order in fact to bypass the feature which is known as Protected Memory.

It is not the assumption of modern computers, that even when a program has been made to run on your computer – or your smart-phone – It would just be allowed ‘to do anything’. Instead, Protected Memory is a feature that blocks user-space programs from accessing memory that does not belong to them. It’s part of the security framework actually built-in to the hardware, that makes up the CPU.

More importantly, user-space programs are never supposed to be able to access kernel memory.

(Updated 01/11/2018 : )

Continue reading Understanding ‘Meltdown’ and ‘Spectre’ in Layman’s Terms

How the Obsolescence of Dalvik Does Not Even Represent an Inconsistency

There was a development with Android, which slipped my attention, and which took place during or after the advent of Android 4.4 (KitKat).

In general, it has always been a fact that Android application-programmers were encouraged to write a kind of source-code, which was identical in its syntax to Java, and which was compiled by their IDE into a kind of bytecode. Only, for many years this language was not officially named Java, instead officially being named Dalvik for some time. Exactly as it goes with Java, this bytecode was then interpreted by a central component of the Android O/S (although, there exist computers on which the JVM is not central to the O/S).

This means, devs were discouraged but not forbidden, from writing their apps in C++ , which would have been compiled directly into native code. In fact, the Native Development Kit has always been an (optional) part of “Android Studio”.

But since Android 5+ (Lollipop), the use of this interpreter has been replaced with something called the Android Runtime (ART). This fact does not present me with much of an inconsistency, only a late awareness of progress.

Even when I was learning some of the basics of programming in Java, one piece of technology which had a name, was a so-called “Flash Compiler”. This was in fact distinct from a “JIT Compiler”, in that the JIT compiler would use the source-code, in order to compile parts of it into native code, while a flash-compiler would only need to use bytecode, in order to derive native code.

So, if the newer Android Runtime flash-compiles the bytecode, this does not change the fact, that devs are writing source-code, which is by default still Java, and initially being compiled by their IDE, into bytecode.

Clearly though, there is a speed-improvement in flash-compiling the bytecode and then executing the resulting native code, over interpreting the bytecode.


 

Yet, the speed-improvement which was once thought to exist in RISC-Chip CPUs, has largely vanished over the History of Computing. One original premise behind RISC-Machines was, that they could be made to run at a higher clock-speed, than Complex Instruction Set (CISC) Computers, and that due to this increase in clock-speed, the RISC-Machines could eventually be made faster.

In addition, early CISC-Machines failed to use concurrency well, in order to execute a number of operations simultaneously. By doing this, modern CISC-Machines also obtain low numbers of clock-cycles per instruction. But I think that this subject will remain in debate, as long as practical CISC-Machines have not exploited concurrency as much as theory should permit.

Since an optimizing compiler generally has the option of compiling source-code into simpler instructions, even when targeting a CISC-Machine, it would follow that realistic source-code needs to be compiled into longer sequences of RISC-Machine instructions.

This debate began before the days, when a CPU had become one chip. Since the CPU is by now a single chip, communication at the speed of light permits a CISC-Machine to have as high a clock-speed as a RISC-Machine. OTOH, the power consumption of a RISC Chip may still be slightly better.

And, as long as the CPU of Android devices only needed to execute a Bytecode Interpreter, this set of instructions could be optimized again, for the purpose of doing so, on a RISC-Machine. But, if we compare the speeds of modern RISC and CISC -Machines running optimized, native code – i.e., compiled C++ – I think it’s clear that the CISC-Machines, including the ones that were meant to run Windows or Linux on, outperform RISC machines.

(Edit 10/09/2017 : )

I believe that there exist specific situations in which a RISC-Machine runs faster, and that a lot of that depends on what language of source-code is being compiled. In the case of RISC-Machines, the compiler has fewer options to optimize the code, than it would with CISC-Machines.

Continue reading How the Obsolescence of Dalvik Does Not Even Represent an Inconsistency