Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49

Open
brianoliver opened this issue Apr 18, 2013 · 2 comments

Comments

@brianoliver
Copy link
Contributor

We need to make sure that #48 does not occur in other patterns. From what I can tell, it looks like the Processing Pattern is the only other place where this possibly happens.

To be able to redeploy a cluster member correctly you need to detach the member from the cluster programmatically calling CacheFactory.shutdown() when application is undeployed. Then the method CacheFactory.shutdown() will call the stop() methods of the distributed services that runs on the leaving member.

Because the stop() method is run by a service thread, no reentrant service calls should be invoked inside the stop method to avoid deadlocks.

THE PROBLEM
The CommandExecutor.stop() have a CacheFactory.ensureCluster() that is a service call within a service call (thus, a reentrant call)

public void stop() {
  if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Stopping CommandExecutor for %s", contextIdentifier); 

  //stop immediately  setState(State.Stopped);

  //this CommandExecutor must not be available any further to other threads   CommandExecutorManager.removeCommandExecutor(this.getContextIdentifier());

  //unregister JMX mbean for the CommandExecutor  Registry registry = CacheFactory.ensureCluster().getManagement(); // THIS IS THE SERVICE CALL   if (registry != null) {
      if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Unregistering JMX management extensions for CommandExecutor %s", contextIdentifier);  
      registry.unregister(getMBeanName());
  }

  if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Stopped CommandExecutor for %s", contextIdentifier);  
}

If the distributed service use to support the command pattern is configured to have a single thread (as it is by default). This call will produce a deadlock with a thread dump like this:

Thread[DistributedCache:DistributedCacheForCommandPattern|SERVICE_STOPPING,5,Cluster]
  com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:424)
  com.oracle.coherence.patterns.command.internal.CommandExecutor.stop(CommandExecutor.java:671)
        ...

DIAGNOSTIC (AND POTENTIAL SOLUTION)
I've changed the code of the CommandExecutor.stop() method to use a non blocking service call to obtain the Cluster

Registry registry = CacheFactory.getCluster() != null ? CacheFactory.getCluster().getManagement() : null;

Because CacheFactory.getCluster() is not a blocking service call the deadlock is avoided.

@brianoliver
Copy link
Contributor Author

Reported by guillermo.garcia-ochoa

@brianoliver
Copy link
Contributor Author

This issue was imported from JIRA COHINC-49

brianoliver pushed a commit that referenced this issue Feb 21, 2017
…, replacing with CacheFactory.getCluster()

Issue #159: Introduced ability to provide a ConfigurableCacheFactory when creating a ProcessingSession
Issue #160: Ensure consistent use of ClassLoaders based on calling context
Issue #161: Ensure Processing Pattern is initialized using the Cache Configuration LifecycleEvents
Issue #162: Introduce Shared ExecutorService for internal background tasks
Issue #163: Resolves fail-over/fail-back of Grid-based Tasks
brianoliver pushed a commit that referenced this issue Feb 21, 2017
…, replacing with CacheFactory.getCluster()

Issue #159: Introduced ability to provide a ConfigurableCacheFactory when creating a ProcessingSession
Issue #160: Ensure consistent use of ClassLoaders based on calling context
Issue #161: Ensure Processing Pattern is initialized using the Cache Configuration LifecycleEvents
Issue #162: Introduce Shared ExecutorService for internal background tasks
Issue #163: Resolves fail-over/fail-back of Grid-based Tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant