We think we have solved the problems we have been having with our Dell
6650s, Oracle 22.214.171.124 and SuSE SLES 8. See previous threads "Re:
[suse-oracle] Migration SLES7 to SLSTD8 (poor Performance)" and "RE:
[suse-oracle] [OT] Proliant DL740 hangs with sles8 (k_smp 2.4.21-190)".
Oracle and eventually SuSE hangs after several days of heavy load. Our
data warehouse runs Oracle 126.96.36.199 on SuSE SLES 8 SP3. Each night we
refresh over 10 gig of data using Oracle's SQL*Loader (sqlldr). During
the day, the database services multiple connections per second from
several web applications. After running for several days, we experienceORA-3113
communications errors and our oracle listener (tnslsnr) often
dies. After approximately 4-5 days the OS becomes unstable and one or
more file systems will eventually hang such that processes accessing the
file system will hang and cannot be killed even with "kill -9". Bug
messages show up in the /var/log/messages describing a problem in
pipe.c, but this is not coincident with a system hang. Other messages
are absent. Some programs that use pipes like rudimentary shell scripts
also hang, but can be "released" by cat-ing the files in
/proc/<pid>/fs/* to /dev/null. These problems manifest on a variety of
different SLES8 kernels, hardware (not Dell specific), RAID drivers, and
filesystems, that is, there is nothing specific to any SLES8 release or
hardware configuration. The only consistency is the use of Oracle,
SQL*Loader, ASYNC IO and other heavy disk IO.
Reproducing the problem:
Load 10+ tables simultaneously into Oracle using sqlldr. Repeat
continuously for 500-1000+ iterations. For "best results" the database
must be configured with ASYNC IO ON, archive logging ON, and logs should
be large (50 meg+) each. Wait for 20+ gigs of logs to accumulate. While
still loading data, "rm" 20+ gig of log files. Using "rm" produces the
most consistent results. Continue loading tables and repeat until the
sqlldr logs show ORA-3113
errors or other TNS errors or loader processes
simply hang. Eventually, all loader processes will simply hang and other
symptoms mentioned above will appear. We can reproduce the problem 100%
of the time using this method.
There are currently several "solutions" for this problem.
1) Disable async IO. This only delays the problem and performance goes
into the toilet. Async IO appears to improve the performance of DML
operations like table drops and truncates by an order of magnitude (at
least). Without async IO our data load scripts will often hang for 20+
minutes on a simple truncate table command (under heavy load).
2) Move to RedHat AS 3.0 running 2.4.21-x kernels. This completely fixed
the problem for us. Unfortunately it also caused a SIGNIFICANT
performance hit. We found that RedHat AS 3 (as of 04/05/04) slowed our
data loads down by at LEAST a factor of 4 times. Yes, you read that
right, SuSE is 4 times faster than RedHat for the kind of large
simultaneous data loading we are doing. Standard query performance does
not appear to be significantly different.
3) Best solution so far... Take the arch/i386/kernel/semaphore.c file
from RedHat AS 3 and replace all i386 based semaphore.c files in the
SuSE SLES 8 kernel source. Recompile and use the new SuSE kernel. This
appears to have completely fixed our problem AND kept performance at
normal SuSE levels. We have logged a support request with SuSE to get
this into an official SuSE kernel. Oracle has also finally assigned a
bug number for this issue (3552278).
After extensive research and some assistance from an extremely helpful
source at RedHat. The problem appears to be related to semaphores in the
kernel. The key to this issue was the pipe.c bugs in the messages file.
Apparently semaphore counters in the kernel are incremented multiple
times during some heavy IO operations, particularly with ASYNC IO
enabled. The counters are incremented above their maximum value of 1.
This causes problems when inodes related to those semaphores are freed
and re-used from the slab cache. When the inode is later used for a pipe
operation the invalid semaphore counter information causes corruption
and the pipe.c bug in the messages file. This problem is more apparent
in Oracle environments with ASYNC IO, because ASYNC IO increments the
semaphore counters (apparently more than non-asyc io). Imagine several
aio threads acting independently each trying to get a semaphore lock,
counters will quickly get out of whack using the buggy code. Without
async io, there is usually only one thread trying to get a semaphore
lock; thus, semaphore counters are incremented in an orderly way. The
Oracle TNS Listener uses pipes extensively and so exhibits problems when
it re-uses broken inode/semaphore/pipes from the cache. Other processes
that also use pipes like shell scripts will also be affected when a
corrupt inode/semaphore combination is re-used. Eventually, under heavy
load it appears that the kernel will eventually starve for working
inodes/pipes/semaphores or a critical kernel operation will fail because
it is using an invalid inode combination, thus leading to system hangs.
RedHat has updated the ../arch/i386/kernel/semaphore.c file and
incorporated a fix for this situation in both RedHat AS 2.1 and a 3.0
(the fixes are different, but equally effective). This fix has
apparently NOT MADE ITS WAY BACK into the linux core kernel even in
2.6.4 (I checked).
We took the semaphore.c file from RedHat AS 3.0 (released under GPL of
course) and replaced ALL of the appropriate semaphore.c files in the
SuSE SLES 8 2.4.21-190 kernel source, then recompiled the kernel
(cloneconfig mode), modules, etc. and installed it. The resulting
"custom" kernel has run for several days with thousands of sqlldr
iterations and many rounds of "rm"-ing archive logs, without a single
Oracle error. The previous stock SuSE kernel never completed more than
400 loads without an error.
SuSE has been sent a copy of all the information we have on this issue
through multiple channels. We hope to see this change included in the
SLES 8 kernel and a new supported kernel released as soon as possible!
I would like to thank the suse-oracle community and particularly Tom
Sightler for your support and ideas. All of your ideas were helpful, but
in the end Tom gave the most useful piece of advice.
For those of you want specific details of what was happening in
semaphore.c, please e-mail me OFF the list.
University of Missouri
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting