Pixel C Crash Yesterday Night

Yesterday evening, my new Pixel C Tablet did something for the first time, which was ominous. Its screen just went dark, and then started to display the logo, which it displays during a restart. It followed through with a successful restart.

Some people mistakenly think that this behavior is a reboot. If we were to call it that, then this behavior would need to be called a Hard Boot – as opposed to a Soft Boot, which happens when the user shuts the tablet down from the software-side, in telling it to reboot. In fact, a Hard Boot would be happening when the user uses the power-button to force a Hard Boot, and would have an explanation in that.

In reality, what the tablet did was a spontaneous reset. This type of event is also a File System Event, as the File System was never unmounted. Hence, the tablet also needed to repair its file system when it booted anew.

But, there are certain safety-factors built into how any serious O/S works, and built into how any file system works. So in most cases, the repair to the file system succeeds.

The fact that this has happened to a brand-new tablet, causes me to question how (un)stable it might really be. I’ve only had this tablet for a few short months now.

One of the features of how this happens, which is even less reassuring, is that after the reset, there is nothing displayed in the user interface, which betrays the fact that the reset happened. What this means is that in theory, this could be happening every night as I sleep, even while the tablet is charging, because by the next morning, there would be nothing displayed, to betray the fact that it has happened.

It just happens to have taken place once now, while I was sitting in front of it.

Dirk

(Edit : )

I should add, that this tablet is running the May 5 patch of Android 7.1.2 .

 

New Case-Fan Installed

During previous postings, I had written about crashes, which the computer I name ‘Phoenix’ was suffering from. And I had written that one possible reason could have been the failed case-fan, which could have been causing something on the motherboard to overheat.

Just today, this box suffered from another similar crash. This time, I opened up the case, and replaced the 92mm case-fan. Therefore, the reader might expect some optimism on my part, that this server-box will not crash again. But in reality I have two reasons, for which my optimism does not overwhelm:

  1. If an overheated chip has already caused crashes, there is some tendency for it to suffer from a memory-effect, of wanting to fail again, whenever it gets slightly warm, or just so. Therefore, due to the first crash possibly having happened for that reason, this machine could now have a penchant for crashing, even though the initial cause has been removed.
  2. The cause may not have been an overheated chip, but rather, a pure software-problem with the legacy graphics driver (nVidia). On such a big display, the graphics driver may have been suffering from some sort of resource leak – aka memory leak – and during boot-up, the BIOS displays it only possesses 128MB of shared RAM! Thus, the problem could be cumulative and result from regular copying-and-pasting, with many HW-accelerated drawing surfaces and many compositing effects enabled. Once we have an unstable graphics driver – and the graphics driver has received several updates recently – having a stable one could be a luxury we cannot easily reproduce.

I was down from roughly 19h00 until 20h00, and apologize to my readers for any inconvenience.

Dirk

BTW: I have an additional reason, not really to believe, that these crashes are due to an overheated graphics chip. During the actual reboot, the graphics chip should get especially hot, and especially so, if the case-fan is not turning.

I can see that if this chip did overheat, the TDR would not be able to reboot it.

But the crashes never seem to occur, directly after the reboot. I generally seem to obtain about 6 days of smooth computing, before another crash happens.

Also, it should not be a VRAM leak, because this is a pre-GPU-type graphics chip. With the old graphics chips, that maximally had several pixel and several vertex pipelines, VRAM consumption was more or less static, while with the more-modern GPUs, some amount of VRAM-creep is at least plausible.

 


root@Phoenix:/home/dirk# lspci | grep vga
root@Phoenix:/home/dirk# lspci | grep VGA
00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2)
root@Phoenix:/home/dirk# lspci -v -s 00:0d.0
00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company Device 2a61
        Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 21
        Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
        [virtual] Expansion ROM at f4000000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Kernel driver in use: nvidia

root@Phoenix:/home/dirk#


 

More Than An Hour Of Down-Time Today

I had installed some very questionable software on the computer which acts as my server, and which is named ‘Phoenix’.

This resulted in an initial crash, but also required me to execute several intentional crashes afterward, just to be sure of what was causing this behavior.

Today was the first day this questionable software was installed.

As a result, my site was effectively down from 10h45 until 12h05.

I sincerely apologize to my readers for this. And no, there is no real reason to assume that this crash had a common cause, with certain past, infrequent crashes. This crash came within 12 hours of my installing something.

As far as I can tell, everything works again – minus the questionable software.

Dirk

 

Plausible does not mean Assumed

I could make hypothetical guesses, as to why crashes like this one happen, on the machine I name ‘Phoenix’, which was manufactured in 2008. This time I noticed, that the cursor on the screen stopped moving, then that mouse-input was not being interpreted, then that the screen just filled with an image, which was a diagonally-scrambled version of the normal screen content:

  • It could be that the old GPU is no longer reliable at the hardware level, and that it may now suffer from random crashes, which also crash the X-server. The “” (‘‘) feature I have seen the nVidia Driver execute properly in past situations, may just not kick in.
  • When I reinstalled, replacing the old 32-bit O/S with the current 64-bit O/S, I also replaced the 2GB of RAM with completely new, 4GB of RAM, and the “” (‘‘) of the new RAM has also become faster, that becoming 800MHz instead of the earlier 600MHz. Either set of DDR RAM modules was running with dual-channel capability. The motherboard may detect this capability of the new RAM modules and start using it, as the motherboard itself may have the stated capability of running at 800MHz. Yet, at 800MHz, the way this Motherboard works may not be stable.
  • There could be some sort of kernel issue…

What I do find a bit more specific, is the fact that there seem to be no log entries for the , suggesting that although an X-server crash eventually takes place, this may not be the root cause. Also, the fact that the mouse has become unresponsive for a few seconds, before screen-content collapses, seems to suggest the same thing…

But the most important fact for me to observe, is that simply being able to suggest plausible reasons for the crash, is not the same thing as having diagnosed the crashes. Honestly, I do not know at present, why this type of crash happens.

One of the observations about this machine which had impressed me in the past, was that I had pushed 3D rendering beyond the limits of the old GPU, thereby crashing this graphics chip, but that the desktop manager I had in place was able to restart the GPU, and to resume the session, without requiring any action from me, but displaying a well-behaved message to the effect that the GPU needed to be rebooted. This is called “” (‘‘), and does the same thing under Linux, that it does under Windows, and depends on stable graphics drivers.

The fact that I do possess ‘‘ on this machine suggests, that a simple failure of the graphics chip, should not take out my session.

Addendum:

According to my latest inquiry, this Motherboard is ‘only’ running at 66MHz. Therefore, the maximum speed of the newer RAM Module should not be an issue after all.

ram_phoenix_1

Dirk

Continue reading Plausible does not mean Assumed