home | list info | list archive | date index | thread index

Re: [OCLUG-Tech] Stress-testing a troublesome Windows 7 system with a Kubuntu LiveCD

  • Subject: Re: [OCLUG-Tech] Stress-testing a troublesome Windows 7 system with a Kubuntu LiveCD
  • From:   <porpen [ at ] gmail [ dot ] com>
  • Date: Sat, 28 Jul 2012 19:14:30 -0400
Hi Bruce,

On 28 July 2012 14:55, Bruce Miller <subscribe [ at ] brmiller [ dot ] ca> wrote:
> The obvious symptom on the Windows 7 machine is a BSOD (Blue Screen of Death) almost every 20 minutes. This is so frequent as to make the machine unusable.
>
> The first exception code on the BSOD is a memory location of <multiple zeros>1. Her tech support person (owner of a well-known local computer store and fellow photographic expert) says that the single digit in the exception code suggests a hardware failure, probably on the motherboard.
>
> I have been running Kubuntu from the liveCD for over eighteen hours which does not support the defective motherboard hypothesis. But my mind has gone blank on how to stress-test the hardware without writing to any hard disks. Google has not been a friend; on this problem, I have found it

First, I'd like to point out that a failing hard drive can produce
"memory problems" if the swap^H^H^H^Hpagefile resides on bad blocks...
and windows doesn't report this.  Try disabling "virtual memory"
altogether and see if it still misbehaves.

Secondly.. Check that drive cable.  What you have described can be
caused by a bad cable.. and too many of those sata data cables do not
have those retainer clips... I only use cables that have retainer
clips due to a bad experience.  Reseating and/or replacing sata cables
has "fixed" mysterious slowness and crashes on some windows systems
I've been asked to help resuscitate.

With those two tricks don't change anything.. I suggest pulling the
hard drive from that system and testing it on a "known to be good"
linux workstation... ideally using an esata drive enclosure of some
kind.  Please avoid using a usb hard drive cradle, as they hide some
of the IO events sata and ide normally use.  Once all set up, here's a
short list of things that can help "save the bacon" on a windows
system.  Some of which can happen simultaneously.

1: memtest at the livecd boot.  if the kubuntu doesn't have it.. try a
knoppix livecd.

2: the backup tool I use on drives that I suspect have errors is this tool:
http://packages.debian.org/search?keywords=gddrescue
sudo ddrescue /dev/??? ./somefilename.image ./somefilename.log
if the error count is zero.. then the drive _appears_ to have zero
read errors.. this doesn't test for write errors.

If either #1 or #2 report a failure.. at least one cause found.

3: hdparm -iI /dev/??? will report a bunch of info, including hardware
reported errors.

4: if there are buffer and/or IO errors between the mainboard and the
hard drive electronics, these will show up in dmesg while the ddrescue
is running.

5: once you have a full snapshot of that drive with ddrescue, you can
examine either the drive, or the image file with
http://packages.debian.org/search?keywords=testdisk
It is read only, but it is aggressive.  It'll even pull out files that
have been long since deleted from the filesystem, but never actually
overwritten.

A while ago, I helped out someone who had a similiar system, including
the symptoms.  The copy run (step 2) started reporting errors in less
than 5 minutes into the run.  In the end... the drive did fail, but I
got as good of a copy as one can get without opening up the hard drive
and reassembling it into a working drive (i.e.. cleanroom work).  That
particular run of step 2 took over 14 days.  It took that long
because, at the end of the run, the drive was power cycling every 2
minutes or so.  I did get a faithful copy over over 98% of the drive
however and when I checked to see what was missing or damaged..
pagefile, hibernation file, windows registry and many other key system
files were corrupted.  This is all after windows chkdsk and scandisk
reported nothing.

I hope this helps.

Cheers!
-Phil