Friday, 20 May 2016

WebSphere JVM startup issue

Troubleshoot : WebSphere JVM hangs during start-up



Issue: When WebSphere JVM is initiated to start, it hangs. no logs are written in SystemOut.log or SystemErr.log.


Environment details:
WebSphere 7.0.0.39 is installed on the Solaris.


Steps tried to resolve the issue:
  1. Kill the Process using the command kill -3 < JVM process id> so that the dump are created and we can check for some information.
  2. Below error messages were in the dump:
Found one Java-level deadlock:                                         
  ===                                                                  
  "Outbound Read Selector.1" daemon prio=3 tid=0x0000000103903800      
    nid=0x33 waiting for monitor entry [0xffffffff6c5fe000]            
    java.lang.Thread.State: BLOCKED (on object monitor)                
  at java.util.logging.LogManager.initializeGlobalHandlers             
     (LogManager.java:1333)                                            
  - waiting to lock <0xfffffffe11551600>                               
    (a com.ibm.ws.bootstrap.WsLogManager)                              
  at java.util.logging.LogManager.access$1500(LogManager.java:128)     
  at java.util.logging.LogManager$RootLogger.accessCheckedHandlers     
     (LogManager.java:1432)                                            
  at java.util.logging.Logger.getHandlers(Logger.java:1260)            
  - locked <0xfffffffe1159bcb0>                                        
    (a java.util.logging.LogManager$RootLogger)                        
  at java.util.logging.Logger.log(Logger.java:512)                     
  at com.ibm.ws.logging.WsLogger.deliverOrBuffer(WsLogger.java:286)    
  at com.ibm.ws.logging.WsLogger.log(WsLogger.java:264)                
  at com.ibm.ws.logging.WsLogger.logp(WsLogger.java:466)               
  at com.ibm.nws.ejs.ras.TraceComponent.debug(TraceComponent.java:132) 
  at com.ibm.nws.ejs.ras.Tr.debug(Tr.java:120)                         
  at com.ibm.ws.tcp.channel.impl.ChannelSelector.run                   
  at java.lang.Thread.run(Thread.java:682)                             
  ===                                                                  
  "P=697699:O=0:CT" prio=3 tid=0x000000010011b800                      
   nid=0x2 waiting for monitor entry [0xffffffff7baf9000]              
   java.lang.Thread.State: BLOCKED (on object monitor)                 
  at java.util.logging.Logger.getHandlers(Logger.java:1260)            
  - waiting to lock <0xfffffffe1159bcb0>                               
    (a java.util.logging.LogManager$RootLogger)                        
  at java.util.logging.Logger.log(Logger.java:512)                     
  at com.ibm.ws.logging.WsLogger.deliverOrBuffer(WsLogger.java:286)    
  at com.ibm.ws.logging.WsLogger.log(WsLogger.java:264)                
  at com.ibm.ws.logging.WsLogger.logp(WsLogger.java:499)               
  at com.ibm.ws.logging.WsLoggerConfigurator.logMessage                
     (WsLoggerConfigurator.java:569)                                   
  at com.ibm.ws.logging.WsLoggerConfigurator                           
     .getCurrentLoggingConfiguration(WsLoggerConfigurator.java:314)    
  at com.ibm.ws.logging.WsLoggerConfigurator                           
     .getLoggerConfiguration(WsLoggerConfigurator.java:123)            
  at com.ibm.ws.logging.WsLoggerConfigurator                           
     .configureLogger(WsLoggerConfigurator.java:375)                   
  at com.ibm.ws.logging.WsLoggerFactoryImpl.createWsLogger             
     (WsLoggerFactoryImpl.java:80)                                     
  at com.ibm.ws.bootstrap.RASDelegator.createWsLogger                  
     (RASDelegator.java:74)                                            
  at com.ibm.ws.bootstrap.WsLogManager.getLogger(WsLogManager.java:220)
  - locked <0xfffffffe11551600> (a com.ibm.ws.bootstrap.WsLogManager)
 
Check whether trace level is set.
It was observed that trace was set to *=finest and hence the JVM did not start up completely.
Removed the trace level and changed to *=info and started the JVM successfully.

We all know that logging is important as it provides valuable information when it comes to troubleshooting. But at the same time we need to understand that it comes with extra cost when it comes to performance. It is resource expensive. It is advisable that when we use logging/tracing we should use it on specific object. Excessive logging can over kill the performance.                        
                                                                       
Hence We should be having the trace spec set for specific object
e.g.     com.ibm.ejs.ras.*=all

Note: *=finest is not a good practice. Same is applicable for *=all.

Thursday, 19 May 2016

Websphere Node agent Synchronization Issue

TroubleShooting: Synchronization problem with Node agent



Issue: Node status shows not synchronized in the  administrative console


Error message:
[18/05/16 14:56:23:059 EST] 000000a3 NodeSyncTask  A   ADMS0036E: The configuration synchronization failed.
[18/05/16 14:56:24:088 EST] 000000a4 NodeSync      E   ADMS0012E: The system is unable to create an administrative client connection: com.ibm.websphere.management.exception.AdminException: Admin client connection to deployment manager is unavailable; nodeagent has not discovered the dmgr
 at com.ibm.ws.management.sync.NodeSync.getAdminClient(NodeSync.java:532)
 at com.ibm.ws.management.sync.NodeSync.getCellRepositoryEpoch(NodeSync.java:407)
 at com.ibm.ws.management.sync.NodeSyncTask.doSync(NodeSyncTask.java:248)
 at com.ibm.ws.management.sync.NodeSyncTask.run(NodeSyncTask.java:157)
 at java.lang.Thread.run(Thread.java:790)
.
[18/05/16 14:56:24:088 EST] 000000a4 NodeSync      E   ADMS0005E: The system is unable to generate synchronization request: com.ibm.websphere.management.exception.AdminException: Admin client connection to deployment manager is unavailable; nodeagent has not discovered the dmgr
 at com.ibm.ws.management.sync.NodeSync.getAdminClient(NodeSync.java:532)
 at com.ibm.ws.management.sync.NodeSync.getCellRepositoryEpoch(NodeSync.java:407)
 at com.ibm.ws.management.sync.NodeSyncTask.doSync(NodeSyncTask.java:248)
 at com.ibm.ws.management.sync.NodeSyncTask.run(NodeSyncTask.java:157)
 at java.lang.Thread.run(Thread.java:790)


Environment details: WAS 8.5.5.9 on Windows,  with both DMGR and Node agent created on the same system.


Steps tried to resolve Synchronization issue
1.      Stopped the Node agent and did a manually sync using the command syncNode.bat. Sync was successful. Started the node agent but on the console, still it showed the status as not synchronized.

2.      Tried to check the NODE_DISCOVERY_ADDRESS port for the node agent whether it is listening using the command:

 netstat -anbo | findstr "NODE_DISCOVERY_ADDRESS port number "

3.      NODE_DISCOVERY_ADDRESS port number was not listening though the node agent was running.

4.      Rebooted the Windows server and it fixed the issue as there would be some windows policy set and applied and requires a reboot.
Now Node agent status shows synchronized.