Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

[BUG] Profile API returns responses when there are still asyn requests running #339

Open
kaituo opened this issue Dec 23, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@kaituo
Copy link
Member

kaituo commented Dec 23, 2020

Describe the bug
A lot of channel closed warning for profile API in ES log.

[2020-11-03T17:48:08,373][WARN ][r.suppressed ] [20728416ac03e0c7fe711affc3da20d5] path: /_opendistro/_anomaly_detection/detectors/w3zMe3UBEMQTObK1dLdD/_profile/init_progress,state,error, params: {detectorID=w3zMe3UBEMQTObK1dLdD, type=init_progress,state,error}
java.lang.IllegalStateException: Channel is already closed
    at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.close(RestController.java:511) ~[elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:504) ~[elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:90) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:84) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction$2.accept(GetAnomalyDetectorTransportAction.java:259) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
    at com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction$2.accept(GetAnomalyDetectorTransportAction.java:256) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.handleSavedResponses(MultiResponsesDelegateActionListener.java:109) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
    at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.finish(MultiResponsesDelegateActionListener.java:95) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
    at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.onResponse(MultiResponsesDelegateActionListener.java:70) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
    at com.amazon.opendistroforelasticsearch.ad.AnomalyDetectorProfileRunner.lambda$onInittedEver$9(AnomalyDetectorProfileRunner.java:467) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:90) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:84) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:545) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:117) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:350) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:344) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:231) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:119) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:125) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:719) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 

To Reproduce
We can see this bug when we run profile all and none of the detectors are started and anomaly result index is not created yet.

http://localhost:9200/_opendistro/_anomaly_detection/detectors/3vEMjHYBXxUIBPx5UAjO/_profile?_all=true&pretty

Expected behavior
We should not see such exceptions.

@kaituo kaituo added the bug Something isn't working label Dec 23, 2020
kaituo added a commit to kaituo/anomaly-detection that referenced this issue Dec 23, 2020
MultiResponsesDelegateActionListener helps send multiple requests asynchronously and return one final response altogether. While waiting for all inflight requests, the method respondImmediately and failImmediately can stop waiting and return immediately. While these two methods are convenient, it is easy to misuse them and cause bugs (see opendistro-for-elasticsearch#339 for example). This PR removes the method respondImmediately and failImmediately and refactor profile runner to avoid using them.

This PR also stops printing out the unknown entity state since it is not useful.

Testing done:
1. Added unit tests to verify the bug fix.
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
kaituo added a commit that referenced this issue Dec 28, 2020
* Fix the profile API returns prematurely.

MultiResponsesDelegateActionListener helps send multiple requests asynchronously and return one final response altogether. While waiting for all inflight requests, the method respondImmediately and failImmediately can stop waiting and return immediately. While these two methods are convenient, it is easy to misuse them and cause bugs (see #339 for example). This PR removes the method respondImmediately and failImmediately and refactor profile runner to avoid using them.

This PR also stops printing out the unknown entity state since it is not useful.

Testing done:
1. Added unit tests to verify the bug fix.
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
@kaituo
Copy link
Member Author

kaituo commented Dec 30, 2020

another case:

[2020-12-28T15:52:58,752][ERROR][o.e.r.a.RestResponseListener] [integTest-0] failed to send failure response
java.lang.IllegalStateException: Channel is already closed
at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.close(RestController.java:511) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:504) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:58) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:49) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.9.1.jar:7.9.1]
at com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction$2.accept(GetAnomalyDetectorTransportAction.java:259) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction$2.accept(GetAnomalyDetectorTransportAction.java:256) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1]
at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.handleSavedResponses(MultiResponsesDelegateActionListener.java:109) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.finish(MultiResponsesDelegateActionListener.java:95) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.onResponse(MultiResponsesDelegateActionListener.java:70) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.AnomalyDetectorProfileRunner.lambda$profileEntityStats$4(AnomalyDetectorProfileRunner.java:260) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:545) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:117) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:350) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:344) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:231) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:119) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:125) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Suppressed: java.lang.IllegalStateException: Channel is already closed
at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.close(RestController.java:511) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:504) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.9.1.jar:7.9.1]
at com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction$2.accept(GetAnomalyDetectorTransportAction.java:259) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.transport.GetAnomalyDetectorTransportAction$2.accept(GetAnomalyDetectorTransportAction.java:256) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1]
at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.handleSavedResponses(MultiResponsesDelegateActionListener.java:109) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.finish(MultiResponsesDelegateActionListener.java:95) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.util.MultiResponsesDelegateActionListener.onResponse(MultiResponsesDelegateActionListener.java:70) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at com.amazon.opendistroforelasticsearch.ad.AnomalyDetectorProfileRunner.lambda$profileEntityStats$4(AnomalyDetectorProfileRunner.java:260) [opendistro-anomaly-detection-1.11.0.0.jar:1.11.0.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:545) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:117) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:350) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:344) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:231) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:119) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:125) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
[2020-12-28T15:53:06,830][INFO ][c.a.o.a.AnomalyDetectorJobRunner] [integTest-0] Start to run AD job K8m8q3YBElg7LzzePLnS

ylwu-amzn pushed a commit that referenced this issue Feb 24, 2021
* Fix the profile API returns prematurely.

MultiResponsesDelegateActionListener helps send multiple requests asynchronously and return one final response altogether. While waiting for all inflight requests, the method respondImmediately and failImmediately can stop waiting and return immediately. While these two methods are convenient, it is easy to misuse them and cause bugs (see #339 for example). This PR removes the method respondImmediately and failImmediately and refactor profile runner to avoid using them.

This PR also stops printing out the unknown entity state since it is not useful.

Testing done:
1. Added unit tests to verify the bug fix.
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant