Yesterday we installed SLES 8 SP3 with full automatic update on a Dell 2650. Then we moved our two PERC4/DC (LSI MegaRAID 320/2) controllers and our Oracle databases to the 2650 and began our stress test load scripts. Since yesterday at 5 pm US/Central there have been over 2000 stress test loads with ZERO errors. So, it looks as if the Dell 6650 itself is the cause of all our problems. It does not appear that the megaraid driver or the cards are at fault, unless there is some problem using them to boot off of. We're not 100% certain of this diagnosis, but the evidence is compelling.
We ordered some 2650's from Dell and in all likelihood we'll grab a couple to rebuild our production data warehouse environment until the 6650 issue can be resolved. I will also update the list under a new thread when/if there are new developments.
Possibly Working Configuration (built on "spare" equipment): Dell 2650, 2-way 2.8GHz Xeon, 4GB RAM Built-in PERC3-DI (Adaptec RAID), 2 PERC4/DC (LSI MegaRAID 320/2) (3.28 A3-1.05 BIOS from Dell) Boot disk: 35GB RAID 0 on the PERC3 (we didn't have a spare drive to make a mirror) Additional Storage: PV220 SCSI 320 with backplane split and connected to the two PERC4 cards. SuSE SLES 8 SP3, 2.4.21-190-smp, megaraid 2.00.8 driver, autoupdate run as of 3/8/04 1200 hrs US/Central, Oracle 9.2.0.3, ASYNC IO ON, LVM OFF, acpi=oldboot on grub boot line.
Definitely BROKEN Configuration: Dell 6650, 4-way 2.0Ghz Xeon, 4GB RAM (A14 BIOS) 2 PERC4/DC (LSI MegaRAID 320/2) (3.28 A3-1.05 BIOS from Dell) Boot disk: internal split backplane connected to first PERC4/Ch0, mirrored 72 gig drive. Additional Storage: Other side of the backplane 146Gig mirrored connected to second PERC4/Ch0; PV220 SCSI320 with split backplane, each side connected to CH1 on each PERC4. SuSE SLES 8 SP3, 2.4.21-190-smp, megaraid 2.00.8 driver (later hand compiled 2.10.1), autoupdate run as of 3/6/04, Oracle 9.2.0.3, ASYNC IO ON, LVM OFF, acpi=oldboot on grub boot line.
System kernel parameters and other ulimits identical (/etc/sysconfig/oracle, /etc/security/limits.conf). Orarun.rpm was run off the SP3 CD.
This just in from the DBA...We just got several "ORA-12599 (See ORA-12599.ora-code.com): TNS:cryptographic checksum mismatch" and "ORA-03113 (See ORA-03113.ora-code.com) end-of-file on communications channel" messages. Both are a precursor to our system hangs. The errors occurred AFTER our archive log drive filled up. Meaning the stress test database filled up our archive log destination and then hung. Strange that the errors are the same as the errors on our 6650 when the archive log destination is mostly empty. I wonder if the archive processes are unable to keep up with the database during heavy data loads and this is causing these errors? Hmmm. Seems strange as we have 9 archive processes running on our production database!
We killed our stress loader and shutdown the Oracle database and restarted it, then re-ran the stress loads. The errors are still occurring. We shutdown the db and stress loads then restarted them again and the errors are gone for now. We did not reboot the system.
I thought we had this problem isolated to 6650 hardware, but perhaps not. We're re-stress testing our 2650 by running standby recovery to current (about 50 gig worth), running our stress test loads and moving a truck load of old archive logs from one disk to another. So far, no errors to report.
Still almost zero feedback on our Oracle TAR!
Thanks for sticking with the story! Andy
> -- --Original Message-- -- > From: McAllister, Andrew [mailto:McAllisterA@(protected)] > Sent: Friday, March 05, 2004 10:10 PM > To: Miquel Colom > Cc: Michael Hasenstein; suse-oracle@(protected); Sightler, Tom > Subject: RE: [suse-oracle] [OT] Proliant DL740 hangs with > sles8 (k_smp 2.4.21-190) > > > Today we upgraded to the most recent Dell BIOS A14 and > verified that the Megaraid bios was up-to-date as well. Then > we installed the latest megaraid driver from the LSI site, > version 2.10.1. Re-ran our stress tests, and 450 loads in, we > are still getting the Oracle errors. Our next step is to > remove the 6650 and replace it with a 2650. We've never had > an error on the stress test 2650. > > We've done a lot of research on the Dell sites and elsewhere. > There are several people having trouble with 6650s, usually > with Linux, some with Windows. But I suppose the windows > people think a system hang is normal (sorry, couldn't > resist). We've already been told by Dell that Suse isn't > supported. But my company just bought over 60 servers from > Dell, so I'm going to bump this up the Dell food chain. > > Andy > snip
-- To unsubscribe, email: suse-oracle-unsubscribe@(protected) For additional commands, email: suse-oracle-help@(protected) Please see http://www.suse.com/oracle/ before posting