> -- --Original Message-- -- > From: Michael Hasenstein [mailto:mha@(protected)] snip > Miquel Colom wrote: snip > > 1-Hangs reported due to FC cards. SOLVED with firmware upgrade. We don't have FC cards and our Perc4/DCs are at 3.28/1.05 firmware etc.
> > 2-Hangs suspected to be due to using framebuffer. DISABLING fmb not > > tested. We never had a problem when using the console.
> > 3-Hangs due to asynch io. SOLVED by applying a RHEL 3.0 (on > SLES8, is that > > correct)?. > > Not RH specific, it is an Oracle bug with the stub libraries. > BUG 3016968 - ASYNCIO FUNCTIONALITY IS NOT WORKING We haven't applied this patch but are researching now.
> > 4-Hangs due to bad reiserfs filesystem. SOLVED with reiserfs fsck. > > Correct? Someone reported this. We'll be fscking next reboot.
> > 5-Finally, Andrew is experiencing hangs due to high load. > There is here no > > FC card, but there is async io and reiserfs. Also there is > a note that > > taking out the broadcom cards contributes to a better > uptime. This can be > > a driver problem or an IO-APIC issue. Hangs not > reproducible on test > > system, only in production (sigh). We can't use ext2 as our database files are all 2gig and some of our nightly data loads come in as files > 2 gig (like 6 gig).
> > > > Do I miss something? > > I'd be interested to see this with ext2, I'd like to know if > reiserfs is > involved, directly or indirectly (triggering a bug somewhere else) > doesn't matter. Snip
Our open TAR number with oracle was sent to Michael off-list.
Our production system is in its last stages before meltdown as of right now. We're limping along until after business hours. One of the two listeners is dying every couple hours (this is also a symptom that we thought was gone with the pro100 card install). And now a user is reporting corrupt database block errors. We'll run the fsck on next reboot.
Any other ideas as to what to look for? If this thing hangs between 8-5 US/Central we'll have to bring it back up immediately, if after 5 we may have an hour or so to tweak or check settings. I'd be happy to run any non-destructive test or check of settings while the machine is on its last legs. Obviously if it is a true hang, we'll have to power cycle.
Another interesting development... We set up a test 2650 with SLES 8 SP3 and 2.4.21-190-smp. We've also built a stress test database and set of data files with load scripts that will continuously load data and check for errors.
This test database on our standby 6650 will produce "ORA-12599 (See ORA-12599.ora-code.com): TNS:cryptographic checksum mismatch" and "ORA-03113 (See ORA-03113.ora-code.com) end-of-file on communications channel" after about 15 minutes. These Oracle errors are the first sign of impending doom. We normally get them after 24 hours of running on our production box. On the production side, eventually this ORA-... error rate goes up and the box will exhibit other symptoms like hung pipes, then file systems, then complete hangs.
On the test 2650 we went through 2700 loads last night without any problem. The test 2650 has ASYNC IO turned OFF!
Differences between the 6650 and 2650: number and speed of CPU's, chipset?, 6650's have PERC4/DC with megaraid2 drivers, 2650 have the internal Adaptec raid (aacraid driver). Async IO was off on the 2650. Both use reiserfs and LVM.
We've just disabled ansyc IO on the standby 6650 and are restarting that stress test. Will report results back.
We just turned async on on the 2650 and relinked all and are starting the stress test over again to see if we get errors. If so, we know it is async io. If not, then something in the kernel or megaraid drivers?
Tom reported problems with max open files: oracle@(protected):/proc/sys/fs> cat file-nr 5676 1495 131072 Should be no problem for us here.
Thus I think we are narrowing down the problems to megaraid, chipset/hardware, or something in the kernel that is aggrivated by our hardware.
Andy
-- To unsubscribe, email: suse-oracle-unsubscribe@(protected) For additional commands, email: suse-oracle-help@(protected) Please see http://www.suse.com/oracle/ before posting