I had written in previous postings, that I had replaced the ‘Nouveau’ graphics-drivers, that are open-source, with proprietary ‘nVidia’ drivers, that offer more capabilities, on the computer which I name ‘Plato’. In this previous posting, I described a bug that had developed between these recent graphics-drivers, and ‘xscreensaver’.
Well there is more, that can go wrong between the CPU and the GPU of a computer, if the computer is operating a considerable GPU.
When applications set up ‘rendering pipelines’ – aka contexts – they are loading data-structures as well as register-values, onto the graphics card and onto its graphics memory. Well, if the application, that would according to older standards only have resided in system memory, either crashes, or gets forcibly closed using a ‘kill -9′ instruction, then the kernel and the graphics driver will fail to clean up, whatever data-structures it had set up on the graphics card.
The ideal behavior would be, that if an application crashes, the kernel not only clean up whatever resources it was using in system memory, and within the O/S, but also, belonging to graphics memory. And for all I know, the programmers of the open-source drivers under Linux may have made this a top priority. But apparently, nVidia did not.
And so a scenario which can take place, is that the user needs to kill a hung application that was making heavy use of the graphics card, and that afterward, the state of the graphics card is corrupted, so that for example, ‘OpenCL‘ kernels will no longer run on it correctly.
This can be particularly unfortunate for Linux users, because under Linux, it’s often a convention, to shut down services and applications, just by sending them a ‘kill’ command.
Just last night this seemed to happen to me, and the only way I was able to get the ‘BOINC Work Units‘ to run again, that were supposed to use OpenCL, was actually to reboot the computer.
I had already noticed this under Windows 7, but it was a bit more surprising to find this happening under Linux. And yet, it seems logical. The arrangements complex, GPU-aware apps make on the graphics card are complex arrangements, and I see little way to guarantee, that the kernel will always be able to clean them up, even if the application crashes badly.
(Edit : )
Using Windows 7, I had already learned, that it’s usually a mistake (which I had made last night), to restart my ‘BOINC Client’, when there are GPU Work Units loaded. Well under Linux, this still seems to be a mistake.
Separately, the question could be asked, whether the ideal is achievable, that the kernel and graphics driver will clean up graphics memory, that was used by applications, which shut down improperly. And I would think Yes, If the applications are using a specific, constrained API, such as OpenGL, which does get used for graphics. But as soon as the API becomes ‘OpenCL’ or ‘CUDA’, I don’t see that this ideal will be achievable.