Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/ofi: refactor gpu pipeline #6891

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

ch4/ofi: refactor gpu pipeline #6891

wants to merge 17 commits into from

Commits on Mar 5, 2024

  1. misc: rename MPIR_gpu_req to MPIR_async_req

    MPIR_gpu_req is a union type for either a MPL_gpu_request or a
    MPIR_Typerep_req, thus it is not just for gpu. Potentially this type can
    be extended to include other internal async task handles. Thus we rename
    it to MPIR_async_req.
    
    We also establish the convention of naming the variable async_req.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    b2074d1 View commit details
    Browse the repository at this point in the history
  2. misc: add MPIR_async_test

    Add an inline wrapper for testing MPIR_async_req.
    
    Modify the order of header inclusion due to the dependency on
    typerep_pre.h.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    20464b6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e0e64ee View commit details
    Browse the repository at this point in the history
  4. ch4/ofi: refactor pipeline recv async copy

    Refactor the async copy in receive events using MPIR_async facilities.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    010f231 View commit details
    Browse the repository at this point in the history
  5. ch4/ofi: refactor pipeline send async copy

    Refactor the async copy before sending a chunk.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    79f020f View commit details
    Browse the repository at this point in the history
  6. ch4/ofi: remove MPIDI_OFI_gpu_progress_task

    Both gpu_send_task_queue and gpu_recv_task_queue have been ported to
    async things.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    70469a4 View commit details
    Browse the repository at this point in the history
  7. ch4/ofi: refactor pipeline send

    Pipeline send allocates chunk buffers then spawns async copy. The
    allocation may run out of genq buffers, thus it is disigned as async
    tasks.
    
    The send copy are triggered upon completion of buffer alloc, thus it is
    renamed into spawn_send_copy and turned into internal static function.
    
    This removes MPIDI_OFI_global.gpu_send_queue.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    0ee16a6 View commit details
    Browse the repository at this point in the history
  8. ch4/ofi: refactor pipeline recv

    Pipeline recv allocates chunk buffers and then post fi_trecv. The
    allocation may run out of genq buffers and we also control the number of
    outstanding recvs, thus it is designed as async tasks.
    
    The async recv copy are triggered in recv event when data arrives.
    
    This removes MPIDI_OFI_global.gpu_recv_queue.
    
    All ofi-layer progress routines for gpu pipelining are now removed.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    e297eee View commit details
    Browse the repository at this point in the history
  9. ch4/ofi: move gpu pipeline events into ofi_gpu_pipeline.c

    Consolidate the gpu pipeline code.
    
    MPIDI_OFI_gpu_pipeline_request is now an internal struct in
    ofi_gpu_pipeline.c, rename to struct chunk_req.
    
    MPIDI_OFI_gpu_pipeline_recv_copy is now an internal function, rename to
    start_recv_copy.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    1756c22 View commit details
    Browse the repository at this point in the history
  10. ch4/ofi: move all gpu pipeline code into ofi_gpu_pipeline.c

    Move all gpu pipeline specific code into ofi_gpu_pipeline.c.
    
    Make a new function MPIDI_OFI_gpu_pipeline_recv that fills rreq with
    persistent pipeline_info data. Rename the original
    MPIDI_OFI_gpu_pipeline_recv into static function start_recv_chunk.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    4baf414 View commit details
    Browse the repository at this point in the history
  11. ch4/ofi: refactor pipeline_info into a union

    Make the code cleaner to separate the pipeline_info type into a union of
    send and recv.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    4ed7909 View commit details
    Browse the repository at this point in the history
  12. ch4/ofi: use explicit counters to track gpu pipeline

    Don't mix the usage of cc_ptr, use separate and explicit counters to
    track the progress and completion of chunks.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    52b93ad View commit details
    Browse the repository at this point in the history
  13. ch4/ofi: use internal tag for pipeline chunk match_bits

    Follow a similar approach as nonblocking collectives, internal pipeline
    chunks use separate tag space (MPIDI_OFI_GPU_PIPELINE_SEND) and
    incrementing tags to avoid mismatch with regular messages.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    96988a1 View commit details
    Browse the repository at this point in the history
  14. ch4/ofi: refactor gpu pipeline recv_alloc

    Separate the recv tasks between the initial header and chunks since the
    paths clearly separates them.
    
    Use a single async item for all chunk recvs rather than unnecessarily
    enqueuing individual chunks since we can track the chunks in the state.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    3e741c7 View commit details
    Browse the repository at this point in the history
  15. ch4/ofi: include ofi_impl.h in ofi_gpu_pipeline.c

    It is needed to compile under noinline configuration.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    2ff2048 View commit details
    Browse the repository at this point in the history
  16. ch4/ofi: move some inline util functions

    Move these utility functions to ofi_impl.h since they are simple and
    non-specific. It also simplifies figuring out which file to include
    especially for .c files.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    8e5a2c2 View commit details
    Browse the repository at this point in the history
  17. ch4/ofi: remove limit in pipeline recv chunk progress

    Remove the limit in posting gpu pipeline recv chunks. The limit can be
    controlled by the maximum chunks from
    MPIDI_OFI_global.gpu_pipeline_recv_pool or when the libfabric return
    EAGAIN.
    
    In progressing the recv_chunk_alloc, we'll issue as many chunks as we
    can instead of one at a time.
    
    Refactor the code to have single exit point.
    hzhou committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    8aacd18 View commit details
    Browse the repository at this point in the history