home | list info | list archive | date index | thread index

[OCLUG-Tech] Finding a Linux Kernel Panic

I've been having some problems with some remote, headless Linux servers for
which I'm partially responsible.  Given the behaviour of the system at the
time of the issue (system unresponsive, nothing in the logs), I'm inclined
to believe this is a kernel panic, but I can't say for sure.

Most of my experience at this depth comes from FreeBSD, which leaves a dump
in /var/crash on boot after a kernel panic (via the 'savecore' binary).
Though this same binary does seem to exist for Linux, it's not in the SuSE
Linux distribution by default (SLES 10.x), making me believe it's not
standard.

So, how does one go about proving these are kernel panics?  I realize going
back in time isn't likely to happen, but if things keep up, we'll see
another panic within the next month.  How do I go about grabbing a copy of
the core dump and/or stack trace?  Is there a more graceful way to handle
kernel panics than what's default (say, kernel.panic = 5)?

(The most obvious route, attach a serial console, is sadly an unviable
option in this case.  The reasons why are long and dry, so I'll save you all
the boring details.)

How do you folks handle kernel panics on headless, production systems?