Home
All Oracle Error Codes
Oracle DBA Forum

Frequent Oracle Errors

TNS:could not resolve the connect identifier specified
Backtrace message unwound by exceptions
invalid identifier
PL/SQL compilation error
internal error
missing expression
table or view does not exist
end-of-file on communication channel
TNS:listener unknown in connect descriptor
insufficient privileges
PL/SQL: numeric or value error string
TNS:protocol adapter error
ORACLE not available
target host or object does not exist
invalid number
unable to allocate string bytes of shared memory
resource busy and acquire with NOWAIT specified
error occurred at recursive SQL level string
ORACLE initialization or shutdown in progress
archiver error. Connect internal only, until freed
snapshot too old
unable to extend temp segment by string in tablespace
Credential retrieval failed
missing or invalid option
invalid username/password; logon denied
unable to create INITIAL extent for segment
out of process memory when trying to allocate string bytes
shared memory realm does not exist
cannot insert NULL
TNS:unable to connect to destination
remote database not found ora-02019
exception encountered: core dump
inconsistent datatypes
no data found
TNS:operation timed out
PL/SQL: could not find program
existing state of packages has been discarded
maximum number of processes exceeded
error signaled in parallel query server
ORACLE instance terminated. Disconnection forced
TNS:packet writer failure
see ORA-12699
missing right parenthesis
name is already used by an existing object
cannot identify/lock data file
invalid file operation
quoted string not properly terminated

[suse-oracle] 9.2.0.3 and SLES 8 SP3 system hangs (solved, long)

McAllister, Andrew

2004-04-05

Replies:
We think we have solved the problems we have been having with our Dell
6650s, Oracle 9.2.0.3 and SuSE SLES 8. See previous threads "Re:
[suse-oracle] Migration SLES7 to SLSTD8 (poor Performance)" and "RE:
[suse-oracle] [OT] Proliant DL740 hangs with sles8 (k_smp 2.4.21-190)".

The problem:
Oracle and eventually SuSE hangs after several days of heavy load. Our
data warehouse runs Oracle 9.2.0.3 on SuSE SLES 8 SP3. Each night we
refresh over 10 gig of data using Oracle's SQL*Loader (sqlldr). During
the day, the database services multiple connections per second from
several web applications. After running for several days, we experience
ORA-3113 communications errors and our oracle listener (tnslsnr) often
dies. After approximately 4-5 days the OS becomes unstable and one or
more file systems will eventually hang such that processes accessing the
file system will hang and cannot be killed even with "kill -9". Bug
messages show up in the /var/log/messages describing a problem in
pipe.c, but this is not coincident with a system hang. Other messages
are absent. Some programs that use pipes like rudimentary shell scripts
also hang, but can be "released" by cat-ing the files in
/proc/<pid>/fs/* to /dev/null. These problems manifest on a variety of
different SLES8 kernels, hardware (not Dell specific), RAID drivers, and
filesystems, that is, there is nothing specific to any SLES8 release or
hardware configuration. The only consistency is the use of Oracle,
SQL*Loader, ASYNC IO and other heavy disk IO.

Reproducing the problem:
Load 10+ tables simultaneously into Oracle using sqlldr. Repeat
continuously for 500-1000+ iterations. For "best results" the database
must be configured with ASYNC IO ON, archive logging ON, and logs should
be large (50 meg+) each. Wait for 20+ gigs of logs to accumulate. While
still loading data, "rm" 20+ gig of log files. Using "rm" produces the
most consistent results. Continue loading tables and repeat until the
sqlldr logs show ORA-3113 errors or other TNS errors or loader processes
simply hang. Eventually, all loader processes will simply hang and other
symptoms mentioned above will appear. We can reproduce the problem 100%
of the time using this method.

Solution:
There are currently several "solutions" for this problem.
1) Disable async IO. This only delays the problem and performance goes
into the toilet. Async IO appears to improve the performance of DML
operations like table drops and truncates by an order of magnitude (at
least). Without async IO our data load scripts will often hang for 20+
minutes on a simple truncate table command (under heavy load).

2) Move to RedHat AS 3.0 running 2.4.21-x kernels. This completely fixed
the problem for us. Unfortunately it also caused a SIGNIFICANT
performance hit. We found that RedHat AS 3 (as of 04/05/04) slowed our
data loads down by at LEAST a factor of 4 times. Yes, you read that
right, SuSE is 4 times faster than RedHat for the kind of large
simultaneous data loading we are doing. Standard query performance does
not appear to be significantly different.

3) Best solution so far... Take the arch/i386/kernel/semaphore.c file
from RedHat AS 3 and replace all i386 based semaphore.c files in the
SuSE SLES 8 kernel source. Recompile and use the new SuSE kernel. This
appears to have completely fixed our problem AND kept performance at
normal SuSE levels. We have logged a support request with SuSE to get
this into an official SuSE kernel. Oracle has also finally assigned a
bug number for this issue (3552278).

More detail...
After extensive research and some assistance from an extremely helpful
source at RedHat. The problem appears to be related to semaphores in the
kernel. The key to this issue was the pipe.c bugs in the messages file.
Apparently semaphore counters in the kernel are incremented multiple
times during some heavy IO operations, particularly with ASYNC IO
enabled. The counters are incremented above their maximum value of 1.
This causes problems when inodes related to those semaphores are freed
and re-used from the slab cache. When the inode is later used for a pipe
operation the invalid semaphore counter information causes corruption
and the pipe.c bug in the messages file. This problem is more apparent
in Oracle environments with ASYNC IO, because ASYNC IO increments the
semaphore counters (apparently more than non-asyc io). Imagine several
aio threads acting independently each trying to get a semaphore lock,
counters will quickly get out of whack using the buggy code. Without
async io, there is usually only one thread trying to get a semaphore
lock; thus, semaphore counters are incremented in an orderly way. The
Oracle TNS Listener uses pipes extensively and so exhibits problems when
it re-uses broken inode/semaphore/pipes from the cache. Other processes
that also use pipes like shell scripts will also be affected when a
corrupt inode/semaphore combination is re-used. Eventually, under heavy
load it appears that the kernel will eventually starve for working
inodes/pipes/semaphores or a critical kernel operation will fail because
it is using an invalid inode combination, thus leading to system hangs.

RedHat has updated the ../arch/i386/kernel/semaphore.c file and
incorporated a fix for this situation in both RedHat AS 2.1 and a 3.0
(the fixes are different, but equally effective). This fix has
apparently NOT MADE ITS WAY BACK into the linux core kernel even in
2.6.4 (I checked).

We took the semaphore.c file from RedHat AS 3.0 (released under GPL of
course) and replaced ALL of the appropriate semaphore.c files in the
SuSE SLES 8 2.4.21-190 kernel source, then recompiled the kernel
(cloneconfig mode), modules, etc. and installed it. The resulting
"custom" kernel has run for several days with thousands of sqlldr
iterations and many rounds of "rm"-ing archive logs, without a single
Oracle error. The previous stock SuSE kernel never completed more than
400 loads without an error.

SuSE has been sent a copy of all the information we have on this issue
through multiple channels. We hope to see this change included in the
SLES 8 kernel and a new supported kernel released as soon as possible!

I would like to thank the suse-oracle community and particularly Tom
Sightler for your support and ideas. All of your ideas were helpful, but
in the end Tom gave the most useful piece of advice.

For those of you want specific details of what was happening in
semaphore.c, please e-mail me OFF the list.

Andrew McAllister
University of Missouri

--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting