Opened 17 years ago

Closed 17 years ago

#46 closed defect (fixed)

Fix the RAID

Reported by: price Owned by: andersk
Priority: blocker Milestone: Alpha
Component: other Version:
Keywords: Cc:

Description

Our RAID is still mysteriously, horrendously slow in certain circumstances. It starts spewing thousands of the same puzzling errors, and some IO hangs for minutes.

The vendor blames the errors on the cable connecting the RAID to black-mesa. Tonight we swapped out the cable and one of the two connectors on its ends; nothing improved.

Tonight we also tried booting a 2.6.24 kernel. The error messages recurred, but the delay could not be reproduced. If we fix #41, we can upgrade the kernel; or we could consider backporting the new driver to our present etch (2.6.18-something) kernel.

We should also go back to the vendor, tell them that replacing the cable didn't help, and ask them to actually help debug the problem.

Change History (11)

comment:1 Changed 17 years ago by quentin

To be clear, it seems that 2.6.24 just recovers from the errors better, rather than spewing them continuously. It is the lack of spewing errors that results in better I/O performance.

We can't actually upgrade the Xen kernel to 2.6.24; #41 refers to guest kernels, not host kernels.

I'm contacting the vendor now.

--Quentin

comment:2 Changed 17 years ago by tabbott

So, what were the results of swapping out components?

comment:3 Changed 17 years ago by tabbott

  • Owner changed from sipb-xen to quentin

comment:4 Changed 17 years ago by quentin

  • Status changed from new to assigned

The results were that nothing fixed it, and a few swaps made it (seem) even worse. We are awaiting a reply to our support ticket (they told us to call them if it was an emergency)

--Quentin

comment:5 Changed 17 years ago by price

The current status from Quentin is that we need to try a new driver they've supplied. This apparently involves recompiling the kernel.

comment:6 Changed 17 years ago by anonymous

  • Owner changed from quentin to andersk
  • Status changed from accepted to assigned

comment:7 Changed 17 years ago by broder

  • Milestone set to Alpha

comment:8 Changed 17 years ago by broder

  • Milestone set to Alpha

comment:9 Changed 17 years ago by price

  • Priority changed from critical to blocker

comment:10 Changed 17 years ago by price

The new driver was in the kernel booted Saturday night, and all seems to be well so far.

I'll wait a few days before closing this, but if someone else is confident the upgrade fixed it they should feel free to close.

comment:11 Changed 17 years ago by price

  • Resolution set to fixed
  • Status changed from assigned to closed

Closing, as it seems to be fixed.

Note: See TracTickets for help on using tickets.