[Linux-disciples] System Logs

Adam Rosi-Kessel adam at rosi-kessel.org
Thu Mar 2 09:30:06 EST 2006


Jason Smith wrote:
>>> It looks like perhaps you had an errant process that got out of  
>>> control and
>>> took over all system resources. This could also conceivably be a  
>>> symptom of
>>> deeper hardware problems.
> Why does that happen? I guess my question is this... what are the  
> differentials used to diagnosis these kinds of problems. I also found  
> the log entry above and could identify that is probably where it went  
> awry but lack the framework to diagnosis the problem....

You're about to hit the steep part of the learning curve. Getting basically
functional with Linux is really not much harder than reaching the same point
with Microsoft Windows or any other operating system, with the added bonus
that you generally don't need to worry about common viruses or trojans and
your system can run stably for years so long as you don't change anything
(while an MS installation seems to breakdown of its own accord after a few
months).

Getting past this point in the curve is kind of like going through medical
school or law school--until you have the complete picture, you're going to
be groping around a bit. (I do think it's a bit easier and quicker than law
school or medical school, though, so long as you are somewhat technically
inclined).

Mostly you just need to ask questions on this list and other specialized
Linux community support lists until you develop an intuition about how to
investigate.

In your case, the first question is going to be whether the problem is
reproducible. If it doesn't happen again, it's going to be awfully hard to
figure out exactly what went wrong, and probably not worth the effort.

If it is reproducible, we'd want to figure out if this is a hardware or
software problem. The only reason I would consider hardware is that it's
pretty rare for a stable kernel to go walkies like yours did, and hardware
failure is one of the few things a kernel just can't deal with.

Although your problem doesn't seem symptomatic of disk failure, I would
always check disks first, since they are the far most likely thing to fail.

You should be running SMART on your disks if they support it, and any modern
hard drive should support SMART.  Install smartmontools if it is not
already. Take a look at what it says for each drive:

smartctl -a /dev/hda

(assuming hda is an actual drive).

Try running a short test and check the results afterward

smartctl -t short /dev/hda

See the smartctl manpage for lots of details.

Assuming your drives are okay, you can also run memtest on your system to
see if it can find any problems with memory. You should run memtest with
your system in 'single user' mode, because it will degrade system
performance substantially. (To go to single user mode, 'telinit 1').

Assuming all the hardware checks out okay, you may have just hit a bug in
one of the packages you are running, in this case possibly ZOPE/Plone. Make
sure your system is updated to the last versions of all packages in your
distribution. Typically anything that causes a crash like this will be
experienced by a bunch of people and fixed quite quickly.

Figuring out which package is responsible can be tricky. You can always try
the slow process-of-elimination approach where you just disable a package
you think is responsible and see if you get error free operation. This might
not be possible on a production machine, or if the problem only occurs every
few weeks, will not be practical.

Actually, another thought is maybe you don't have enough RAM and swap space
(or disk space). Linux gets troubled when it runs out of resources. Make
sure your swap space is at least twice your RAM, and for a server system
running ZOPE/Plone, Apache, etc., you should have at least 256M RAM and 512M
swap space. 'free -m' will tell you what you have.

Those are some starting thoughts, anyway.

Adam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://lists.bostoncoop.net/pipermail/linux-disciples/attachments/20060302/70c2fe45/signature.pgp


More information about the Linux-disciples mailing list