Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49

brianoliver · 2013-04-18T13:06:28Z

We need to make sure that #48 does not occur in other patterns. From what I can tell, it looks like the Processing Pattern is the only other place where this possibly happens.

To be able to redeploy a cluster member correctly you need to detach the member from the cluster programmatically calling CacheFactory.shutdown() when application is undeployed. Then the method CacheFactory.shutdown() will call the stop() methods of the distributed services that runs on the leaving member.

Because the stop() method is run by a service thread, no reentrant service calls should be invoked inside the stop method to avoid deadlocks.

THE PROBLEM
The CommandExecutor.stop() have a CacheFactory.ensureCluster() that is a service call within a service call (thus, a reentrant call)
public void stop() {
  if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Stopping CommandExecutor for %s", contextIdentifier); 

  //stop immediately  setState(State.Stopped);

  //this CommandExecutor must not be available any further to other threads   CommandExecutorManager.removeCommandExecutor(this.getContextIdentifier());

  //unregister JMX mbean for the CommandExecutor  Registry registry = CacheFactory.ensureCluster().getManagement(); // THIS IS THE SERVICE CALL   if (registry != null) {
      if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Unregistering JMX management extensions for CommandExecutor %s", contextIdentifier);  
      registry.unregister(getMBeanName());
  }

  if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Stopped CommandExecutor for %s", contextIdentifier);  
}
If the distributed service use to support the command pattern is configured to have a single thread (as it is by default). This call will produce a deadlock with a thread dump like this:
Thread[DistributedCache:DistributedCacheForCommandPattern|SERVICE_STOPPING,5,Cluster]
  com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:424)
  com.oracle.coherence.patterns.command.internal.CommandExecutor.stop(CommandExecutor.java:671)
        ...
DIAGNOSTIC (AND POTENTIAL SOLUTION)
I've changed the code of the CommandExecutor.stop() method to use a non blocking service call to obtain the Cluster
Registry registry = CacheFactory.getCluster() != null ? CacheFactory.getCluster().getManagement() : null;
Because CacheFactory.getCluster() is not a blocking service call the deadlock is avoided.

The text was updated successfully, but these errors were encountered:

brianoliver · 2013-04-18T13:06:28Z

Reported by guillermo.garcia-ochoa

brianoliver · 2016-06-30T17:24:02Z

This issue was imported from JIRA COHINC-49

…, replacing with CacheFactory.getCluster() Issue #159: Introduced ability to provide a ConfigurableCacheFactory when creating a ProcessingSession Issue #160: Ensure consistent use of ClassLoaders based on calling context Issue #161: Ensure Processing Pattern is initialized using the Cache Configuration LifecycleEvents Issue #162: Introduce Shared ExecutorService for internal background tasks Issue #163: Resolves fail-over/fail-back of Grid-based Tasks

brianoliver added Type: Defect Priority: Major Module: Processing Pattern labels Jun 30, 2016

brianoliver added this to the 11-FUTURE milestone Jun 30, 2016

brianoliver self-assigned this Jun 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49

Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49

brianoliver commented Apr 18, 2013

brianoliver commented Apr 18, 2013

brianoliver commented Jun 30, 2016

Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49

Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49

Comments

brianoliver commented Apr 18, 2013

brianoliver commented Apr 18, 2013

brianoliver commented Jun 30, 2016