Continuing to Troubleshoot my Graphics Chip

The computer named ‘Phoenix’ has now been running for 6 days and 19 hours. It is approaching the point, where recently it used to crash. I have suspected that my graphics chip might be the problem, and that more specifically, this might have happened, because the graphics chip is only supposed to take 128MB of shared RAM according to the BIOS, but according to which it has been allocated 256MB, by the Linux software. It is an outdated graphics chip, which I have written about here.

Under Linux we have numerous tools that give us information about our machines, including the ‘‘ here:

phoenix_ram_2

I have already written that I cannot take the 70MB of shared RAM as an exact figure, of how much VRAM the chip is taking, which is actually system memory in this case, because there could be other processes using shared memory in some way. Yet, this is still an approximate estimate, and, since the amount being indicated is 70MB, the total amount of shared memory taken by my graphics chip, cannot be greater than 71MB.

This basically rules out one hypothesis for what might have been happening. Under normal use, the chip will not need more than 128MB.

Now, the fact that I have installed a new case fan, may or may not protect me from future crashes, since my other main hypothesis was, ‘Something could have been overheating on the motherboard.’

(Edit 02/14/2017 : )

And this is what my shared memory looked like today, 8 days and 23 hours after the latest reboot:

phoenix_ram_3

( 59+ MB – Stable )

Dirk

 

I question the amount of VRAM on Phoenix.

I am still contemplating, why the server-box I name ‘‘ was crashing, and my attention keeps coming back to the graphics chip. Before this computer was resurrected, it was running in 32-bit mode, as ‘‘. At that time, it only had 2GB of RAM. But now it runs in 64-bit mode, with 4GB of RAM.

When I boot, the BIOS message still tells me that it has 128MB of shared memory, for the graphics chip. But strangely enough, the piece of text I pasted into this posting, reads that the graphics driver has set aside 256MB of VRAM, near the top of the 4GB of physical addresses. I did not know that the kernel can override a BIOS setting in this way, let us say just because processing has been switched to 64-bit mode.

One mishap which could naively go wrong, is that the legacy driver, unaware of the specifics of this motherboard, could be allocating 256MB of shared memory, but that physically, the hardware cannot share past the address ‘‘. That is, the address ‘‘ may have become forbidden territory for the graphics card. It is however uncommon, that the programmers of kernel-space modules, would make such a simple mistake.

This is a 64-bit system, which only accepts up to 4GB of RAM, thus only possessing 32-bit physical addresses, to go with its 64-bit virtual addresses.

According to this screen-shot:

phoenix_ram_1

I only have 3.74GB of RAM available to the system, instead of 4GB. The reason for this, is the fact that 256MB have in fact been reserved for the graphics chip. By itself this would seem to suggest, that the allocation has succeeded.

Also, the fact that 49.26MB of shared memory was momentarily being indicated, is not too telling, because several types of processes could be using shared memory for some purpose. This feature does not only exist, for user-space processes to make texture images available to the graphics card.

Continue reading I question the amount of VRAM on Phoenix.

NoMachine NX

When people connect to their VPN, this could simply allow them to access shared files. But alternatively, this could also mean that they wish to create a virtual session, on the remote desktop of one of their servers. The latter exists under the terms VNC, RDP, XRDP, and several others.

On my main Linux server named ‘Phoenix’, I have the XRDP service installed, which is the Linux equivalent of RDP. But one main drawback of this method, of remotely accessing a desktop, is the fact that XRDP does not allow file-sharing, specifically in the version of this protocol that runs out-of-the-box from the package manager. I have read that certain custom-compiled versions support this, but do recall that this service is a mess to custom-compile, and to set up in such a way that it runs reliably. So I stick to the packaged version for now, and do not obtain file-sharing.

There exists a closed-source application named , which we could use to bridge this gap. But while their paid software subscriptions are very expensive (from my perspective), their Free software version has some big disadvantages.

First of all, even their Free version can be run in client or in server mode. I think that this is terrific. But in server mode – which affords access to the local machine desktop from elsewhere – there is no built-in support for SSH protocol. There is only the unencrypted NX protocol, for which their service listens.

Secondly, not every computer is strong enough to run in server mode. On the computer ‘Phoenix’ I have a fragile X-server, and this service has actually crashed my X-server. Not only that, but allowing this service to run on reboot, consistently prevents my X-server from starting. It gets its hooks into the session so early on boot, that the X-server crashes, before the user is even asked for a graphical log-in.

On the plus side, there are ways of solving both problems.

Continue reading NoMachine NX

Another possible hypothesis, for why my server-box sometimes crashes.

I have written before, that my Linux computer ‘Phoenix’, which acts both as my server and a workstation, sometimes crashes. I have another possible explanation for why.

The graphics chip on this machine is only a , capable of OpenGL 2.1.2 using proprietary (legacy) drivers. It only has 128MB of shared memory with my motherboard.

Under Windows 10, this chipset is no longer supported at all.

I may simply be pushing this old GPU too hard.

My display is a 1600×1200 monitor, and much of the graphics memory is simply being taken up by that fact. Also, I have many forms of desktop compositing switched on. And at the time of the last crash, I had numerous applications open at the same time, which use hardware 2D acceleration as part of their canvas. And I was copying and pasting between them.

I am hoping that this is easing the burden on my equally-dated CPU.

But then the triggering factor may simply be an eventual error in the GPU.

The fact that the Timeout Detection and Recovery (‘TDR’) does not kick in to save the session, may be due to the possibility that the TDR only works, in specific situations, such as OpenGL, 3D rendering windows. If the GPU crash happens as part of the compositing, it may take out the X-server, and therefore my whole system.

The only workaround I may have, is to avoid using this box as a workstation. When I avoid doing that, it has been known to run for 60 days straight, without crashing…

Dirk

(Edit 01/28/2017 : )

I use a widget on my desktops, which is named ‘‘, and I find that it gives me a good intuitive grasp of what is happening on my Linux computers.


 

phoenix_temperatures_1


 

This widget has as a disadvantage, that when extensions have been installed to display temperatures, sometimes we do not know which temperature-sensors stand for which temperature. This is due to the fact that Linux developers have to design their software, without any knowledge of the specific hardware it is going to run on. Inversely, the makers of proprietary drivers know exactly which machine those are going to run on, and can therefore identify what each of them stands for.

This also means, sometimes we have temperature readings in ‘‘, which may just be spurious, and which may just constantly display one meaningless number, in which case we reduce our selection of indicated temperatures to ones we can identify.

In the context of answering my own question, another detail which becomes relevant, is the fact that this tower computer has a failed case-fan, which is accurately being indicated as the ‘‘ entry, running at 46 RPM at the moment of the screen-shot. I know that this case-fan is in fact stalled, from past occasions when I opened up the tower.

Continue reading Another possible hypothesis, for why my server-box sometimes crashes.