[NTLUG:Discuss] writing to a USB drive locks up server..
Richard
ntlug at rain4us.net
Wed Aug 20 23:56:44 CDT 2008
While I've seen my dirvish banks running on reiserfs formatted drives
get corrupt and lock up a server, I had never seen it with ext2/3
drives. I THOUGHT I had just run across an ext2/3 file system
corruption = server hang but now I'm beginning to wonder.
The dirvish restore to the new server hardware went smooth (and mostly
without hiccups -- there were a few drivers I had to compile for the new
SCSI card and nics) and I was looking forward to smooth sailing.
Unfortunately that hasn't been the case. I have been having issues on
this new hardware. Attempting heavy write access to a USB drive
containing one of my banks causes the server to lockup. The file
system contained errors and since I had a backup copy of the vaults in
that bank I decided to backup my vault configs and reformat the drive
fresh. I had previously reformatted reiserfs filesystems to 'fix'
corruption that caused lockups and I was surprised that I seemed to have
the same issue with ext2/3. When I kicked off a reformat on the dirvish
bank drive, the server wrote about 147 of it's inode allocations and
then the server just paused. At first the server was still pingable,
but that quickly deteriorated.
The numlock worked but the console was unresponsive. Use of the Magic
SysRq commands allowed me to Sync, Unmount and reBoot the server mostly
gracefully but now I am wondering what technical situations could lead
to a server hanging on USB disk access.
The Slackware 9.1 w/2.4.20 kernel that is running was stable on the old
hardware(yes, I know...that was the OLD hardware)...I fear that a kernel
upgrade will be necessary on this new hardware but I'm hoping someone
else on the list has seen a problem similar to this one and can offer
suggestions. I am not looking forward to dealing with getting an
upgraded kernel patched , compatible and ready to run
Dead-Gateway-Detection (DGD), mppe, and uml processes only to find that
the problem is hardware related, BIOS setting related or some other such
cause.
Troubleshooting steps taken so far include removing the add-on USB PCI
card (to solve the NMI errors that were being seen in the log files) and
disabling SMP in the kernel( so that processes on the server would quit
going into Un-interruptible sleep mode(D)).
There were no syslog/debug/messages log entries during this latest
'lockup during a mke2fs' incident.
What are my next troubleshooting steps?
Can one surgically upgrade just usb drivers and utilities?
--
Richard
More information about the Discuss
mailing list