A common-sense observation about CPU caching policies.

One of the subjects which I did write about recently, concerned CPU caching, and special considerations for multi-threaded processes, in which more than one thread, running on different cores, need to share information. What I had written was that, if a CPU core merely writes data to its L1 cache, then the corresponding line in the cache is merely marked ‘dirty’ – not, flushed to RAM. And a common-sense question which my readers could have about that would be ‘Why such a policy? Why Not flush that line of the cache immediately?’ And I’d give a 2-part answer to that question:

  1. When a CPU core writes data to a line of cache – aka, to a range of memory addresses, as the program sees things – it will typically do so numerous times successively. The cache will only speed up the operation of the CPU, if the replacement operations are considerably less frequent, than the number of times the core reads or writes data.
  2. Most of the time, the CPU core that wrote the data will also be the core which needs to read that data back ‘from memory’. The only real exception takes place in a multi-threaded program.

What I did go on to write however was that, once a replacement operation is performed on a dirty line of the cache, or, once the core explicitly asks to flush that line of the cache, its data is written out, towards RAM. Then, specifically in the second scenario, the operation presumably propagates from L1 cache to L2 cache, if there is any, and then to L3 cache as well… There is an added observation to make about such propagation. If there is more than one cache, this also needs to be bidirectional to some extent. The reason for this would be the infrequently-stated intent, that to flush a line of cache, should make any changes written to its data visible to the other cores of the same CPU.

What this should also mean is that when there are other L1 caches’ lines that correspond to L2 cache lines being written to, or other L2 caches’ lines that correspond to L3 cache lines being written to, write-back needs to take place, so that ultimately, an L1 cache serving a different core will not contain any erroneous data. And the easiest way to accomplish that might be, just to make sure that the lines of cache affected, are left cleared, so that once their respective cores try to read the same memory addresses again, the updated data can be replaced into them.

Effectively, if the caching policy is inclusive, and if there are two separate L1 caches with dirty lines corresponding to one L2 cache line being written to, one of those L1 caches’ lines is orphaned… That line of cache may best just be cleared, too bad.

An entirely separate question could exist, of whether an L3 cache line then also needs to be cleared, when all the cores of a CPU map through it. It could simply be marked ‘dirty’ in turn.

But such realizations also make the real-world design of CPU cache, a nightmare which only the highest-ranking Electronics Engineers can tackle.

Continue reading A common-sense observation about CPU caching policies.

A disadvantage in running Linux, on a multi-core CPU that’s threaded.

One of the facts about modern computing is, that the hardware could include a multi-core CPU, with a number of virtual cores different from the number of full cores. Such CPUs were once called “Hyper-Threaded”, but are now only called “Threaded”.

If the CPU has 8 virtual cores, but is threaded as only 4 full cores, then there will only be a speed advantage, when running 4 processes. But because processes are sometimes multi-threaded, each of those 4 processes could consist of 2 fully-busy threads, and benefit from a further doubling of speed because each full core has 2 virtual cores.

It’s really a feature of Windows to exploit this fully, while Linux tends to ignore this. When Linux runs on such a CPU, it only ‘sees’ the maximum number of virtual cores, as the logical number of cores that the hardware has, without taking into account that they could be pairing in some way, to result in a lower number of full cores.

And to a certain extent, the Linux kernel is justified in doing so because unlike how it is with Windows, it’s actually just as cheap for a Linux computer to run a high number of separate processes, as it is to run processes with the same number of threads. Two threads share a code segment as well as a data segment (heap), but have two separate stack segments as well as different register-values. This makes them ‘enlightened processes’. Well they only really run faster under Windows (or maybe under OS/X).

Under Linux it’s fully feasible just to create many processes instead, so the bulk of the programming work does not make use as much of multi-threading. Of course Even under Linux, code is sometimes written to be multi-threaded, for reasons I won’t go into here.

But then under Linux, there was also never effort put into the kernel recognizing two of its logical cores, as belonging to the same full core.

(Updated 2/19/2019, 17h30 … )

Continue reading A disadvantage in running Linux, on a multi-core CPU that’s threaded.