Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asterisk CommandTimeout on #join #539

Open
bklang opened this issue Feb 19, 2015 · 4 comments
Open

Asterisk CommandTimeout on #join #539

bklang opened this issue Feb 19, 2015 · 4 comments
Milestone

Comments

@bklang
Copy link
Member

bklang commented Feb 19, 2015

Logs are here: https://gist.github.com/bklang/5c4be487efd02695adde

Note that this is easy enough to produce, but not 100%. I've never seen it on the first attempt, and I almost always see it on the 3rd or 4th attempt.

The workflow is this:

  1. Agent logs into the dialer by marking himself "online"
  2. Adhearsion calls the agent
  3. Agent listens to MOH
  4. Dialer beings placing calls
  5. When a Caller answers, MOH is started and he is added to the queue
  6. If/when an Agent is available, we attempt to connect the Caller to him
  7. During the connection process, we:
    • stop MOH
    • play a "beep" to the Agent
    • send a SIP Message containing screen -pop data
  8. We then bridge the two

The Gist contains the Asterisk log at full (including debug) and the Adhearsion logs at trace.

The attempt to connect the caller to the agent occurs at [2015-02-19 23:06:41.240].

Agent call: 98817160-9b78-473b-8eea-2b9943c175ff
Queued call: 29fcf752-ac81-4d05-887e-6b2306d71e29

[2015-02-19 23:06:41.240] INFO  ElectricSlide::CallQueue: Connecting #<Agent: extension: 1001, campaign_ref: 2040, state: available, call: 98817160-9b78-473b-8eea-2b9943c175ff> with SIP/billing/0121110517

During the connection callback, we do step 7 above, which you can see in the logs.

The first error occurs at [2015-02-19 23:07:41.603]:

[2015-02-19 23:07:41.603] ERROR Celluloid: ElectricSlide::CallQueue crashed!
Adhearsion::Call::CommandTimeout: #<Punchblock::Command::Join target_call_id="98817160-9b78-473b-8eea-2b9943c175ff", target_mixer_name=nil, component_id=nil, source_uri=nil, domain=nil, transport=nil, timestamp=Thu, 19 Feb 2015 23:06:41 +0200, request_id="53b2af5e-26fe-4888-bce8-9e999bf23d16", call_uri="29fcf752-ac81-4d05-887e-6b2306d71e29", mixer_name=nil, direction=nil, media=nil>
    EMPTY BACKTRACE

Is it significant that the log actually shows several CommandTimeout exceptions? I can't see how, but is it possible we are attempting to execute multiple joins?

@benlangfeld
Copy link
Member

If you eliminate all MOH does the issue go away? Do you have a minimal reproduction? It seems this currently requires a complex application.

@bklang
Copy link
Member Author

bklang commented Feb 20, 2015

I think the problem here was the concurrent attempt at executing a transfer (to stop MOH), attempting to play back the "beep" & send the message, and executing the #join. It may even be just the last two of those. I was able to stop getting CommandTimeout by ensuring the call controller that plays the beep and sends the message completes before attempting the join.

Is this something we can do anything about within Punchblock? It's arguable that my code was buggy, but that the CommandTimeout was also the wrong failure mode.

@benlangfeld
Copy link
Member

Is this something we can do anything about within Punchblock?

Switch to a sane protocol like ARI?

@benlangfeld
Copy link
Member

See also #529

@benlangfeld benlangfeld modified the milestone: 3.0.0 Dec 20, 2015
@benlangfeld benlangfeld modified the milestones: 3.0.0, 3.1.0 Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants