Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous conference bridge operation #3928

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

nanangizz
Copy link
Member

@nanangizz nanangizz commented Apr 16, 2024

In #3183, video conference bridge adopts asynchronous ports operations to handle non-uniform lock ordering issues. This PR integrates the same mechanism for audio conference bridge.

Beside avoiding deadlock, this change may improve performance a little bit as audio data processing in the conference bridge will be performed without holding a mutex.

There are some pending tasks (e.g: may also need to integrate this into audio switchboard, update PJMEDIA ports to have their own pool (see issues below)), but perhaps we need to discuss & decide whether the work should be continued.

Potential backward compatibility issues (major):

  1. Pool release immediately after port removal, consider this code:

    pool = pj_pool_create(...);
    pjmedia_tonegen_create(pool, ..., &tonegen_port);
    pjmedia_conf_add_port(conf, ..., tonegen_port, ..., &tonegen_port_id);
    ...
    pjmedia_conf_remove_port(conf, tonegen_port_id);
    pjmedia_port_destroy(tonegen_port);
    pj_pool_release(pool); /* <-- This is now a problem!
                                  It was fine with a synchronous pjmedia_conf_remove_port() */
    

    Note that the PJMEDIA tonegen port does not have its own pool (it allocates its own instances and all internal states using the supplied memory pool), so releasing the pool immediately after removing the port from conference bridge may cause crash as the port removal is now asynchronous and it may be still being accessed by the conference bridge.
    Unfortunately, the real removal operation will not invoke any callback, so application cannot be so sure the safe time for releasing the pool. FYI the actual removal is done by the conference bridge clock which usually is a sound device, so if the clock is running, adding pj_thread_sleep(100) before relasing pool may avoid the crash (note that clock interval is usually 20ms, considering possible audio device burst, 100ms delay is generally sufficient).
    The proper solution is perhaps modifying the PJMEDIA tonegen to have its own pool.

  2. PJMEDIA Stream pattern.
    PJMEDIA Stream exports a PJMEDIA port interface which can be queried using pjmedia_stream_get_port(). Unfortunately the port does not implements pjmedia_port_destroy(), it is destroyed using pjmedia_stream_destroy() instead, which will release the memory pool. Similar to above issue, if pjmedia_stream_destroy() is invoked immediately after pjmedia_conf_remove_port(), it may cause crash. If pjmedia_port_destroy() is implemented (FYI this PR actually implements it for PJMEDIA Stream), it should not crash as the conference bridge will install group lock to the PJMEDIA port and pjmedia_port_destroy() implementation will only be called after group lock's reference counter reaches zero.

  3. Caching pool destroy.
    When all PJMEDIA ports have their own pools, there is still possibility that application uses its pool factory for creating the ports and destroy the pool factory (e.g: invoking pj_caching_pool_destroy()) prematurely before the actual port removal is done. For example, in application shutdown, the conference bridge clock has been stopped so actual port removal can only be done later by pjmedia_conf_destroy()/pjsua_destroy(), but in between, application may destroy its pool factory which may cause crash.
    A possible solution is to delay the pool factory destroy to PJLIB deinitialization via pj_atexit().

@sauwming
Copy link
Member

Since issue 2 is already handled, I assume it's no longer a problem (i.e. no longer causes backward compatibility)?

Which only leaves issue 1. I understand it will be quite a lot to change all existing pjmedia_port(s) to have its own pool, but since it's necessary for this feature, I vote to continue.

And if 1&2 are already resolved, the change in pjsua_app.c is no longer necessary, correct? (i.e. app will not need to change anything).

@nanangizz
Copy link
Member Author

Since issue 2 is already handled, I assume it's no longer a problem (i.e. no longer causes backward compatibility)?

Which only leaves issue 1. I understand it will be quite a lot to change all existing pjmedia_port(s) to have its own pool, but since it's necessary for this feature, I vote to continue.

The backward compatibility issues cannot be completely avoided, e.g:

  • if any custom/proprietary pjmedia ports are used, need to check/update them,
  • applications may need to be adjusted to avoid issue 3 (pool factory premature destroy).

And if 1&2 are already resolved, the change in pjsua_app.c is no longer necessary, correct? (i.e. app will not need to change anything).

Yes for pjsua_app.c, other apps may need adjustment (as mentioned above).

@sauwming
Copy link
Member

Regarding issue 3, I don't think app is allowed to call pj_caching_pool_destroy() before pjmedia_conf_destroy() since the conference creation uses the pool, i.e. pjmedia_conf_create(pool, ...)?
Also, using pj_atexit() doesn't seem possible since pjsip can be restarted (instead of shutdown).

The important thing here seems to be that pjmedia_conf_destroy() must guarantee the completion of all pending async operations, if any.

@nanangizz
Copy link
Member Author

Regarding issue 3, I don't think app is allowed to call pj_caching_pool_destroy() before pjmedia_conf_destroy() since the conference creation uses the pool, i.e. pjmedia_conf_create(pool, ...)? Also, using pj_atexit() doesn't seem possible since pjsip can be restarted (instead of shutdown).

Here is a sample scenario. App using PJSUA creates its own pool factory (instead of using PJSUA's) for some reason, and it uses the pool/factory to create a PJMEDIA port. It has to destroy the pool factory before shutting down PJSUA.

Re: lib (PJSUA?) restart, pj_atexit() will still be called right?

The important thing here seems to be that pjmedia_conf_destroy() must guarantee the completion of all pending async operations, if any.

Yes, pjmedia_conf_destroy() flushes all pending async ops.

@sauwming
Copy link
Member

Here is a sample scenario. App using PJSUA creates its own pool factory (instead of using PJSUA's) for some reason, and it uses the pool/factory to create a PJMEDIA port. It has to destroy the pool factory before shutting down PJSUA.

I see.

Re: lib (PJSUA?) restart, pj_atexit() will still be called right?

Ah, right. It's called upon pj_shutdown().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants