One of the subjects which I did write about recently, concerned CPU caching, and special considerations for multi-threaded processes, in which more than one thread, running on different cores, need to share information. What I had written was that, if a CPU core merely writes data to its L1 cache, then the corresponding line in the cache is merely marked ‘dirty’ – not, flushed to RAM. And a common-sense question which my readers could have about that would be ‘Why such a policy? Why Not flush that line of the cache immediately?’ And I’d give a 2-part answer to that question:
- When a CPU core writes data to a line of cache – aka, to a range of memory addresses, as the program sees things – it will typically do so numerous times successively. The cache will only speed up the operation of the CPU, if the replacement operations are considerably less frequent, than the number of times the core reads or writes data.
- Most of the time, the CPU core that wrote the data will also be the core which needs to read that data back ‘from memory’. The only real exception takes place in a multi-threaded program.
What I did go on to write however was that, once a replacement operation is performed on a dirty line of the cache, or, once the core explicitly asks to flush that line of the cache, its data is written out, towards RAM. Then, specifically in the second scenario, the operation presumably propagates from L1 cache to L2 cache, if there is any, and then to L3 cache as well… There is an added observation to make about such propagation. If there is more than one cache, this also needs to be bidirectional to some extent. The reason for this would be the infrequently-stated intent, that to flush a line of cache, should make any changes written to its data visible to the other cores of the same CPU.
What this should also mean is that when there are other L1 caches’ lines that correspond to L2 cache lines being written to, or other L2 caches’ lines that correspond to L3 cache lines being written to, write-back needs to take place, so that ultimately, an L1 cache serving a different core will not contain any erroneous data. And the easiest way to accomplish that might be, just to make sure that the lines of cache affected, are left cleared, so that once their respective cores try to read the same memory addresses again, the updated data can be replaced into them.
Effectively, if the caching policy is inclusive, and if there are two separate L1 caches with dirty lines corresponding to one L2 cache line being written to, one of those L1 caches’ lines is orphaned… That line of cache may best just be cleared, too bad.
An entirely separate question could exist, of whether an L3 cache line then also needs to be cleared, when all the cores of a CPU map through it. It could simply be marked ‘dirty’ in turn.
But such realizations also make the real-world design of CPU cache, a nightmare which only the highest-ranking Electronics Engineers can tackle.