Thursday, 10 November 2011

Avoid ORA-31603 on Sqldeveloper on DDL tabs

I recently have an issue of giving privileges to a Database support user that user for application team to do production support. I have grant Select any dictionary and Select Catalog role to the user, but still they still not able to view the SQL tabs(DDL scripts) on Sqldeveloper from other users and giving ORA-31603.
I tested in UAT database all are ok without any issue to view the DDL tab. After comparing what is the different between UAT and PROD sqldeveloper, found out that when I grant "Select Catalog Role" it is not the default Role of that user. After adding "Select Catalog Role" as a default role to that support user, it works perfectly now.
Not sure if this only happen to my environment. We use Sqldeveloper-Version 2.1.1.64.

Thursday, 6 October 2011

Oracle Software Investment Guide

For those needs more detail information on how oracle licensing on their products. Please refer to this links from oracle  : http://www.oracle.com/us/corporate/pricing/sig-070616.pdf. Many people ask about Standby/ Dataguard licensing if let say we have Oracle DB EE license on primary site. You may refer to that links to give some guide to it.

Tuesday, 20 September 2011

Oracle Database Support annoucement on the Itanium Processor

To anyone haven't heard about this news from Oracle. Oracle has an update about what will be supported and not supported on Itanium processor. For my case, we actually just upgraded RAC running on HP UX Itanium from 10.2.0.4 to 11.2.0.1. It is quiet a "big" news for us since most of the Databases are running on HP UX Itanium. See below notes :

"Oracle Database 11g Release 2 is currently shipping on HP-UX Itanium, latest available patchset level is 11.2.0.2. We will continue to ship further 11g release 2 patchsets, PSUs and CPUs on HP-UX Itanium, just as we do on other platforms. The next release of Oracle Database, currently called 12g, will not be shipped on HP-UX Itanium. Customers can either stay on an Oracle Database 11g Release, as per Oracle's Lifetime Support Policy, or move to another platform when they want to upgrade to the next major Oracle release.

Oracle Database 11g Release 2 (including all the 11.2 patchsets)will continue to be supported on HP-UX 11.31. Oracle will certify Database 11.2 on future versions of the HP-UX operating system that are generally available within the Extended Support period for 11g Release 2 as documented in the Oracle Lifetime Support Policy.  Oracle does not require explicit certification on future versions of the Itanium processor provided HP supports the versions of the HP-UX operating system that Oracle Database 11g Release 2 is certified on.
..."

For more details please check this article on Oracle support site : 1307745.1

Friday, 26 August 2011

How to recover Oracle TDE wallet Password which almost "Lost"

I recently have an issue with Oracle TDE wallet password. As part of our quarterly Security patch, we are planning to patch one of the database which using this Wallet for the encryption. Means I will need to shutdown and startup the database. The problem is when we restart the database then we need to open the wallet so the tables that encrypted can be visible for the trusted connection. Just guess what ? nobody sure about what is the wallet password in production.( The previous DBA has left the company, also with the password - "Very good").

Back to the issue, now I have to make sure I can open the wallet with the correct password before I can shutdown this database. This issue as good as "This Database should be alive all the time", once it is down without the wallet password it will become useless. I actually raise SR about this issue on how to test the wallet password without affecting our production database operation. As in the documentation written, there is no way to recover the password if the password is lost. But they never mention on how to test it if somehow nobody remember the password.Anyhow Oracle support still telling me there is no way to test the password.

Well I almost give up. But I just do my own testing, I just copy the wallet file from production to my testing environment. Let say my wallet in production are located here(sqlnet.ora) :
ENCRYPTION_WALLET_LOCATION=
     (SOURCE=(METHOD=FILE)(METHOD_DATA=
             (DIRECTORY=/u01/app/oracle/admin/DBPROD/wallet)))

I just copy "ewallet.p12" to my testing environment directory :
ENCRYPTION_WALLET_LOCATION=
    (SOURCE=(METHOD=FILE)(METHOD_DATA=
            (DIRECTORY=D:\Oracle\product\10.2.0\admin\DB10G\wallet)))

The first thing is we need to test  to open that wallet with any password that we think is correct. Once we can open the wallet with the correct password, we can test if the wallet is valid to open the data in production. I import a table from production that having encryption on one of the columns. Well this tricks works for me and I managed to patch my database in production and open the wallet safely.
Of course if the wallet password can't be find, my suggestion is make sure that database keep alive :)
Let's keep the wallet password safely now.

Thursday, 14 July 2011

Some Note on Firewall with Oracle RAC

If anyone implementing Firewall filtering for SQL*Net Proxy, please make sure that packet redirection is using fix port not random, because this will make the Listener redirection failed by assigning random port number.
For details please see this note on Oracle Support (361284.1).
On one of our system, this issue suddenly come up when some application that calling tnsnames to use the loadbalancing intermittently failed with ORA-12541. Initially I thought it is a connection issue, but further I do a trace from client side and found below :

....
[000001 05-APR-2011 14:55:15:106] nspsend: 01 09 00 00 06 00 00 00 |........|
[000001 05-APR-2011 14:55:15:106] nspsend: 00 00 28 44 45 53 43 52 |..(DESCR|
[000001 05-APR-2011 14:55:15:106] nspsend: 49 50 54 49 4F 4E 3D 28 |IPTION=(|
[000001 05-APR-2011 14:55:15:106] nspsend: 41 44 44 52 45 53 53 3D |ADDRESS=|
[000001 05-APR-2011 14:55:15:106] nspsend: 28 50 52 4F 54 4F 43 4F |(PROTOCO|
[000001 05-APR-2011 14:55:15:106] nspsend: 4C 3D 54 43 50 29 28 48 |L=TCP)(H|
[000001 05-APR-2011 14:55:15:106] nspsend: 4F 53 54 3D 6D 6E 65 74 |OST=mnet|
[000001 05-APR-2011 14:55:15:106] nspsend: 64 62 32 2D 76 69 70 29 |db2-vip)|
[000001 05-APR-2011 14:55:15:106] nspsend: 28 50 4F 52 54 3D 31 35 |(PORT=15|
[000001 05-APR-2011 14:55:15:106] nspsend: 32 31 29 29 28 4C 4F 41 |21))(LOA|
[000001 05-APR-2011 14:55:15:106] nspsend: 44 5F 42 41 4C 41 4E 43 |D_BALANC|
[000001 05-APR-2011 14:55:15:106] nspsend: 45 3D 79 65 73 29 28 43 |E=yes)(C|
[000001 05-APR-2011 14:55:15:106] nspsend: 4F 4E 4E 45 43 54 5F 44 |ONNECT_D|
[000001 05-APR-2011 14:55:15:106] nspsend: 41 54 41 3D 28 53 45 52 |ATA=(SER|
[000001 05-APR-2011 14:55:15:106] nspsend: 56 45 52 3D 44 45 44 49 |VER=DEDI|
[000001 05-APR-2011 14:55:15:106] nspsend: 43 41 54 45 44 29 28 53 |CATED)(S|
[000001 05-APR-2011 14:55:15:106] nspsend: 45 52 56 49 43 45 5F 4E |ERVICE_N|
[000001 05-APR-2011 14:55:15:106] nspsend: 41 4D 45 3D 43 48 4E 4E |AME=CHNN|
[000001 05-APR-2011 14:55:15:106] nspsend: 45 4C 29 28 46 41 49 4C |EL)(FAIL|
[000001 05-APR-2011 14:55:15:106] nspsend: 4F 56 45 52 5F 4D 4F 44 |OVER_MOD|
[000001 05-APR-2011 14:55:15:106] nspsend: 45 3D 28 54 59 50 45 3D |E=(TYPE=| ----> Here the client has sent connect packet to the listener on mnetdb2-vip
[000001 05-APR-2011 14:55:15:106] nspsend: 53 45 4C 45 43 54 29 28 |SELECT)(|
[000001 05-APR-2011 14:55:15:106] nspsend: 4D 45 54 48 4F 44 3D 42 |METHOD=B|
[000001 05-APR-2011 14:55:15:106] nspsend: 41 53 49 43 29 28 52 45 |ASIC)(RE|
[000001 05-APR-2011 14:55:15:106] nspsend: 54 52 49 45 53 3D 31 38 |TRIES=18|
[000001 05-APR-2011 14:55:15:106] nspsend: 30 29 28 44 45 4C 41 59 |0)(DELAY|
[000001 05-APR-2011 14:55:15:106] nspsend: 3D 35 29 29 28 43 49 44 |=5))(CID|
[000001 05-APR-2011 14:55:15:106] nspsend: 3D 28 50 52 4F 47 52 41 |=(PROGRA|
[000001 05-APR-2011 14:55:15:106] nspsend: 4D 3D 73 71 6C 70 6C 75 |M=sqlplu|
[000001 05-APR-2011 14:55:15:106] nspsend: 73 29 28 48 4F 53 54 3D |s)(HOST=|
[000001 05-APR-2011 14:55:15:106] nspsend: 6D 6E 65 74 61 70 70 33 |mnetapp3|
[000001 05-APR-2011 14:55:15:106] nspsend: 29 28 55 53 45 52 3D 6F |)(USER=o|
[000001 05-APR-2011 14:55:15:106] nspsend: 72 61 63 6C 65 29 29 29 |racle)))|

.....
[000001 05-APR-2011 14:55:15:110] nscall: connecting...
[000001 05-APR-2011 14:55:15:110] nsc2addr: entry
[000001 05-APR-2011 14:55:15:110] nsc2addr: (ADDRESS=(PROTOCOL=TCP)(HOST=mnetdb1-vip)(PORT=33949))
[000001 05-APR-2011 14:55:15:110] nttbnd2addr: entry
[000001 05-APR-2011 14:55:15:110] snlinGetAddrInfo: entry
[000001 05-APR-2011 14:55:15:110] snlinGetAddrInfo: Invalid IP address string mnetdb1-vip ----> Here the transport was unable to open to the PORT 33949 on mnetdb1-vip
[000001 05-APR-2011 14:55:15:110] snlinFreeAddrInfo: entry

At that point we notice it assign to port number 33949 which wrong port number. We only use port 1521.
The challenge now is to ask the network engineer to change the firewall rule. In our case the firewall use by many other subsytem. Well this issue never been an issue for all other application that using Weblogic.

Wednesday, 29 June 2011

Bug 9572787 Long waits for ‘enq: AM – disk offline’ following cell or disk crash (can cause further instance crashes)

If anyone is implementing Oracle 11g R2 (11.2.0.2) with extended cluster might get affected by this bugs.
In my case, this bug comes out when we tested to offline the whole second Disk array on the Failgroup 2 which we using ASM normal redundancy.

Until now we still waiting for the merge patch for this issue. Hope they will release the patch soon. For more details about this bugs please check on Oracle support portal.

Update 8 July 2011:
Oracle released the merge patch. The test result is much better, the instances was still rebooted. Only until we set the hidden parameter ,recommend by Oracle support then the instances survived. The parameter is
_lm_rcvr_hang_kill=FALSE

 Do not use this parameter unless Oracle support told us to do so.

Tuesday, 28 June 2011

ORA-12514 during Switchover using DG broker with multiple IP segment

Our new system 2 node RAC Oracle 11.2.0.2 are setup with 1 instance of Standby database. Yesterday we tested to switchover to standby database using 'dgmgrl'. The switchover was ok , only that my Primary Node (Node1) was not automatically restarted instead throwing ORA-12514 - TNS:listener does not currently know of service requested in connect descriptor. 
I checked all my listener for DG broker have been setup with static listener, which usually this is the cause of this error. Everything looks properly setup until I checked each instance Broker configuration "StaticConnectIdentifier"(New parameter in 11g) property :
dgmgrl > show instance verbose "PRD1"
...
StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.22.2.29)(PORT=1532))(CONNECT_DATA=(SERVICE_NAME=PRD_DGMGRL)(INSTANCE_NAME=PRD1)(SERVER=DEDICATED)))';

dgmgrl > show instance verbose "PRD2"
StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.22.2.30)(PORT=1532))(CONNECT_DATA=(SERVICE_NAME=PRD_DGMGRL)(INSTANCE_NAME=PRD2)(SERVER=DEDICATED)))';

Well, we configure all the static listener on 172.22.12.7 (VIP1 - Segment 12) and 172.22.12.9 (VIP2 - Segment 12). I remember that IP segment 172.22.2.xx was the initial VIP segment during the installation. 

Then I just change it to the correct IP and port number.
dgmgrl> edit instance 'PRD1' set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.22.12.7)(PORT=1523))(CONNECT_DATA=(SERVICE_NAME=PRD_DGMGRL)(INSTANCE_NAME=PRD1)(SERVER=DEDICATED)))';
dgmgrl> edit instance 'PRD2' set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.22.12.9)(PORT=1523))(CONNECT_DATA=(SERVICE_NAME=PRD_DGMGRL)(INSTANCE_NAME=PRD2)(SERVER=DEDICATED)))';

After this changes my switchover was running smoothly without any error. Problem solved!

Thursday, 23 June 2011

Running Catalog/Catproc.sql on Primary DB with Dataguard, will this invalid my Physical standby?

A few days back, when I checked one of my newly created DB was having many invalid objects. I checked dba_registry and giving me some of the component are invalid. After a several time running utlrp.sql still didn't help to compile all the SYS invalid objects. There are a few objects of Advanced Queuing objects that keep invalid even after manual compile. I raise an SR to oracle to check if this objects are corrupted. 
Then Oracle support ask me to clean up those AQ objects and run Catalog/Catproc.sql on my Primary DB. I just wondering if this will make my physical standby DB become invalid. Since this is Pre-Production environment and many people are testing on it now, then I try to google and check in oracle support if any article about this. I give up and open a new SR about this issue. They confirmed it is fine to rerun catalog/catproc.sql in my environment, everything will be replicated to the standby site. 

I tested and it run's well without any issue.Of course, in my case I need to turn off cluster parameter temporary before running it and startup the db in upgrade mode(startup upgrade).  Now all my missing corrupted objects are back and no more invalid component in the dba_registry. I check my dataguard for the whole day looks good.

Oracle 11gR2 RAC stuck on rootcrs.pl -unlock when applying PSU

About 2 weeks ago I did run this patch PSU 12311357 on my UAT environment which having about the same setup with my Pre -production , 2 node RAC running on 11.2.0.2. Everything was went fine on UAT. Then the next day I applied this patch on Production Environment include single instance Dataguard.

First patch on Dataguard site was running very smooth, then continue with the Pre-Production on 1st node which I got stuck when doing "rootcrs.pl -unlock", I wait for about 1 hour and still not moving. Well there must be something wrong with this process. I checked all my CRS stack are gone and shutdown properly. On which file it still trying hard to unlock then ?
What I did that time is, I abort that session (CTRL+C). Then I try to run again the unlock command. Still it stucks there. 
But this time I keep moving on to apply the patch since I don't have anymore time to open an SR, due to a critical fix that we need from this patch(I believe many people don't agree with me on my decision here). Well the patch was running fine without any issue. It also happen to 2nd node. 
I open SR about this issue and turns out that it was hang due to many log files under GRID_HOME to unlock. 
Here is the error message when that unlock command hang :
rootcrs_prd05istl1swmadb.log

....
2011-05-29 18:40:45: ###### Begin DIE Stack Trace ######
2011-05-29 18:40:45: Package File Line Calling
2011-05-29 18:40:45: --------------- -------------------- ---- ----------
2011-05-29 18:40:45: 1: main rootcrs.pl 325 crsconfig_lib::dietrap
2011-05-29 18:40:45: 2: crsconfig_lib s_crsconfig_lib.pm 350 main::__ANON__
2011-05-29 18:40:45: 3: crsconfig_lib s_crsconfig_lib.pm 350 (eval)
2011-05-29 18:40:45: 4: File::Find Find.pm 886 crsconfig_lib::reset_perms
2011-05-29 18:40:45: 5: File::Find Find.pm 700 File::Find::_find_dir
2011-05-29 18:40:45: 6: File::Find Find.pm 1229 File::Find::_find_opt
2011-05-29 18:40:45: 7: crsconfig_lib s_crsconfig_lib.pm 332 File::Find::finddepth
2011-05-29 18:40:45: 8: crsconfig_lib crsconfig_lib.pm 11198 crsconfig_lib::s_reset_crshome1
2011-05-29 18:40:45: 9: crsconfig_lib crsconfig_lib.pm 11209 crsconfig_lib::unlockCRSHomefordeinstall
2011-05-29 18:40:45: 10: main rootcrs.pl 382 crsconfig_lib::unlockCRSHome
2011-05-29 18:40:45: ####### End DIE Stack Trace #######
I still hold this issue, since so far I checked there is no way to just simulate running this unlock command without affecting the GRID stack. I will update again this article after I apply my next patch to see if this is related to my previous post "Oracle 11g R2 ASM Audit log generate a lot of files"

Update:
It is confirm due to a lot of file on Audit log cause the issue. After removing the files, I applied the next patch everything went very smooth. The rootcrs.pl -unlock actually doing permission change on files listed under GRID_HOME.

Oracle 11g R2 ASM Audit log generate a lot of files

I recently noticed, in my 2 Node RAC 11gR2(11.2.0.2) my disk usage on /u01 increase rapidly when no process are running. This mount point contains GRID_HOME and ORACLE_HOME on each node.
I manage to find which directory that causing this. It directed me to GRID_HOME under /u01/app/11.2/grid/rdbms/audit/ which generate a lot of files (+asm1_ora*.aud) every minutes and when I count just guess how many files are there ? around 200K files and keep generated.
 I remember only enable SYS auditing on database level which write directly to the OS syslog. Then I checked on my ASM instance if there is Auditing enabled and guess what? All audit parameter on all my ASM instance are disabled('FALSE' value).
 Let me see if anything written in documentation about this.
 http://oracle.su/docs/11g/network.112/e10574/auditing.htm#CEGJADDC

Activities That Are Always Audited for All Platforms

Oracle Database always audits certain database-related operations and writes them to the operating system audit files. It includes the actions of any user who is logged in with the SYSDBA or SYSOPER privilege. This is called mandatory auditing. Even if you have enabled the database audit trail (that is, setting the AUDIT_TRAIL parameter to DB), Oracle Database still writes mandatory records to operating system files.
By default, the operating system files are in the $ORACLE_HOME/admin/$ORACLE_SID/adump directory on UNIX systems. On Windows systems, Oracle Database writes this information to the Windows Event Viewer. You can change the location of this directory by setting the AUDIT_FILE_DEST initialization parameter, which is described in "Specifying a Directory for the Operating System Audit Trail".
Mandatory auditing includes the following operations:
  • Database startup. An audit record is generated that lists the operating system user starting the instance, the user terminal identifier, and the date and time stamp. This data is stored in the operating system audit trail because the database audit trail is not available until after the startup has successfully completed.
  • SYSDBA and SYSOPER logins. Oracle Database records all SYSDBA and SYSOPER connections.
  • Database shutdown. An audit record is generated that lists the operating system user shutting down the instance, the user terminal identifier, and the date and time stamp.

But in my case, it is generated too aggresive and I did trace level 12 which come up with nothing else than user SYS or SYSASM login information on those files. I open SR with oracle support about this, to check if I hit a bug. Came back with a workarround to put a cron job to delete the files periodically as per needed. For me I don't need the audit files under ASM instances, so I put a cron job to delete every hour.
Well maybe this only happen in my environment, but hope this will help someone that hit the same issue like me.
My environment :
AIX 6.1
Oracle RAC 11gR2(11.2.0.2) PSU2 - 2 Node (non shared home)
ASM
Extended Cluster