My computer Plato is having a technical issue.

One of the main computers which I’ve been using, that is named ‘Plato’, that was running Debian / Stretch, has experienced a major technical problem. When I got home this afternoon, I found it was not running. And, when I pushed the power button, it did not turn on.

A basic, automatic idea which would pop into people’s heads is, ‘The power-supply burned out.’ If the only task which lies ahead really was, to replace the power supply, I’d have it easy. This is a tower-computer from the year 2011, with a Sabertooth X58 motherboard.

  • The correct power-supplies for this old MB may have become hard to find,
  • Even if I had a replacement power-supply, it would be very cumbersome to replace because the harnesses of the present one loops behind too many recessed compartments, within the case.

The only thing I’ve done so far, is to perform a diagnostic test. I disconnected all the jacks between the power-supply and the MB, and retried the power button. My purpose behind that was, the idea that modern power supplies will refuse to turn on, if they sense a short-circuit between their load, and ground. Thus, if the power supply had been able to resume, with the MB disconnected, I’d know it was the MB, and I’d also know there’s no point in replacing the power-supply. But thankfully, the power-supply also did not power up like that. So I reconnected the power-supply to the MB.

So as it stands, I don’t know the best way to proceed, but am without the use of that trusty computer for now.

(Update 2/7/2019, 14h15 : )

One reason this apparent loss is unfortunate is the fact that, being my only Debian / Stretch computer, that computer was also the only one, which had “SageMath” installed and working on it. So my available Computer Algebra Systems are reduced to “Maxima” and “Yacas” for now.

(Update 2/9/2019, 18h50 : )

Actually, I’ve learned that my so-called diagnostic test was pointless. The power button these days, does not have a direct connection to the power-supply, to signal that the power-supply should turn on. The power button has its connection to the M.B., which tells the power-supply to turn on. Therefore, with the M.B. disconnected from the power-supply, there was no way for the power-supply even to get the signal, to turn on.

A personal friend of mine has lent me a power-supply tester, so that I’ll next be able to test that more properly. And, hoping that it is just the power-supply which is faulty, I’ll look into replacing it.

(As of 2/7/2019, 14h15 … )

Continue reading My computer Plato is having a technical issue.

Another possible hypothesis, for why my server-box sometimes crashes.

I have written before, that my Linux computer ‘Phoenix’, which acts both as my server and a workstation, sometimes crashes. I have another possible explanation for why.

The graphics chip on this machine is only a , capable of OpenGL 2.1.2 using proprietary (legacy) drivers. It only has 128MB of shared memory with my motherboard.

Under Windows 10, this chipset is no longer supported at all.

I may simply be pushing this old GPU too hard.

My display is a 1600×1200 monitor, and much of the graphics memory is simply being taken up by that fact. Also, I have many forms of desktop compositing switched on. And at the time of the last crash, I had numerous applications open at the same time, which use hardware 2D acceleration as part of their canvas. And I was copying and pasting between them.

I am hoping that this is easing the burden on my equally-dated CPU.

But then the triggering factor may simply be an eventual error in the GPU.

The fact that the Timeout Detection and Recovery (‘TDR’) does not kick in to save the session, may be due to the possibility that the TDR only works, in specific situations, such as OpenGL, 3D rendering windows. If the GPU crash happens as part of the compositing, it may take out the X-server, and therefore my whole system.

The only workaround I may have, is to avoid using this box as a workstation. When I avoid doing that, it has been known to run for 60 days straight, without crashing…

Dirk

(Edit 01/28/2017 : )

I use a widget on my desktops, which is named ‘‘, and I find that it gives me a good intuitive grasp of what is happening on my Linux computers.


 

phoenix_temperatures_1


 

This widget has as a disadvantage, that when extensions have been installed to display temperatures, sometimes we do not know which temperature-sensors stand for which temperature. This is due to the fact that Linux developers have to design their software, without any knowledge of the specific hardware it is going to run on. Inversely, the makers of proprietary drivers know exactly which machine those are going to run on, and can therefore identify what each of them stands for.

This also means, sometimes we have temperature readings in ‘‘, which may just be spurious, and which may just constantly display one meaningless number, in which case we reduce our selection of indicated temperatures to ones we can identify.

In the context of answering my own question, another detail which becomes relevant, is the fact that this tower computer has a failed case-fan, which is accurately being indicated as the ‘‘ entry, running at 46 RPM at the moment of the screen-shot. I know that this case-fan is in fact stalled, from past occasions when I opened up the tower.

Continue reading Another possible hypothesis, for why my server-box sometimes crashes.

Plausible does not mean Assumed

I could make hypothetical guesses, as to why crashes like this one happen, on the machine I name ‘Phoenix’, which was manufactured in 2008. This time I noticed, that the cursor on the screen stopped moving, then that mouse-input was not being interpreted, then that the screen just filled with an image, which was a diagonally-scrambled version of the normal screen content:

  • It could be that the old GPU is no longer reliable at the hardware level, and that it may now suffer from random crashes, which also crash the X-server. The “” (‘‘) feature I have seen the nVidia Driver execute properly in past situations, may just not kick in.
  • When I reinstalled, replacing the old 32-bit O/S with the current 64-bit O/S, I also replaced the 2GB of RAM with completely new, 4GB of RAM, and the “” (‘‘) of the new RAM has also become faster, that becoming 800MHz instead of the earlier 600MHz. Either set of DDR RAM modules was running with dual-channel capability. The motherboard may detect this capability of the new RAM modules and start using it, as the motherboard itself may have the stated capability of running at 800MHz. Yet, at 800MHz, the way this Motherboard works may not be stable.
  • There could be some sort of kernel issue…

What I do find a bit more specific, is the fact that there seem to be no log entries for the , suggesting that although an X-server crash eventually takes place, this may not be the root cause. Also, the fact that the mouse has become unresponsive for a few seconds, before screen-content collapses, seems to suggest the same thing…

But the most important fact for me to observe, is that simply being able to suggest plausible reasons for the crash, is not the same thing as having diagnosed the crashes. Honestly, I do not know at present, why this type of crash happens.

One of the observations about this machine which had impressed me in the past, was that I had pushed 3D rendering beyond the limits of the old GPU, thereby crashing this graphics chip, but that the desktop manager I had in place was able to restart the GPU, and to resume the session, without requiring any action from me, but displaying a well-behaved message to the effect that the GPU needed to be rebooted. This is called “” (‘‘), and does the same thing under Linux, that it does under Windows, and depends on stable graphics drivers.

The fact that I do possess ‘‘ on this machine suggests, that a simple failure of the graphics chip, should not take out my session.

Addendum:

According to my latest inquiry, this Motherboard is ‘only’ running at 66MHz. Therefore, the maximum speed of the newer RAM Module should not be an issue after all.

ram_phoenix_1

Dirk

Continue reading Plausible does not mean Assumed