Skip to content

WeeklyTelcon_20180807

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Howard
  • Nathan Hjelm
  • Geoff Paulsen
  • Peter Gottesman (Cisco)
  • Thomas Naughton
  • Todd Kordenbrock
  • Xin Zhao

not there today (I keep this for easy cut-n-paste for future notes)

  • Brian
  • akshay
  • Geoffroy Vallee
  • Matthew Dosanjh
  • Ralph Castain
  • Joshua Ladd
  • Josh Hursey
  • Matias Cabral
  • Howard.
  • Edgar Gabriel
  • Akvenkatesh (nVidia)
  • Howard Pritchard
  • Dan Topa (LANL)
  • David Bernholdt
  • Dan Topa (LANL)

Agenda/New Business

  • NEW: info.c warning - Jeff thought we'd fixed, but ralph saw on Cray.

  • Nathan is requestiong Comments on

    • C11 integration into master. PR5445
    • eliminate all of our atomic for C11 atomics.
    • ACTION: Please review and comment on code.
  • ORTE discussion went well, Geoffroy Vallee wrote up summary and posted to devel-core on Jul 24th.

    • ACTION: Everyone please read and reply to devel-core with your thoughts.
  • github suggestion on email filtering

Minutes

Review v2.x Milestones v2.1.4

  • v2.1.4 - put out an RC1 v2.1.4
  • Always used to have a src RPM as part of RC.
  • Jeff had some problems using Python scrypt to upload 2.1.4 tarballs built on aws to s3.
  • Type-o fix for PMIx (MB prefix), but not upgrading because 2.1.4 is end of 2.x stream
  • Peter filed an Issue 5520
    • Thread Multiple warnings when exit on an error. Doesn't block.
  • Aug 10th is release date.
    • Test RC, get feedback back.

Review v3.0.x Milestones v3.0.3

  • Schedule:
  • PR 5437 - George reviewed.
  • PR 5484 - want into RC1, but Giles on vacation. - Nathan can test
  • Want RC1 next week.
  • v3.0.3 - targeting Sept 1st (more start RCs when 2.1 wraps up.
    • Anticipate RC1 after Aug 10th release of v2.1.4 releases.
    • Got good progress in reviews.

Review v3.1.x Milestones v3.1.0

  • v3.1.2 release process, starts after Sept 1st release of v3.0.3
  • Lots of PRs multiple 5485
  • ucx segfault
  • 5083 - we just need some update. Xin Zhao will update issue.

v4.0.0

  • Schedule: branch: July 18. release: Sept 17
    • Date for first RC - Aug 13 (after sunset of 2.1.4)
  • Cuda support:
    • Does nVidia want if --with-cuda, then openib included by default?
      • Yes, because at this moment UCX is not on par, but still want to migrate to ucx cuda.
      • Warning message will mention deficate openib vs ucx
      • Has this work been done???
  • NEWS - Depricate MPIR message for NEWs - Ralph can help with this.
  • PR 5497 - ROMIO wait for Giles to review.
  • PR 5472 - joint effort of 4 commits - Jeff to review
  • PR 5504 - Please ensure bug fixes only, and seperate commits to allow us to consider seperately.
  • Geoff and Howard will build test suites with v3.1.x and run with master/v4.0 to see if anything breaks.

PMIx

  • ORTE/PRTE - Geoffroy Vallee sent out document with summary to core-devel. Everyone please read and reply.
    • Want to make sure that there are very good alternatives to whatever orte is turning into that will use PMIx.
    • Replacing framework and calling PMIx directly is a really good idea.
      • Will mess up if there is no native support for PMIx.
    • in Open MPI v5.0.x timeframe.

New topics

  • From last week:
    • MTT License discussion - MTT needs to be de-GPL-ified.
      • All go try the python. - All the GPL is in the perl modules (using python works around that).
      • Ralph started a PR, and now in limbo. Need to get this done by end of 2018
    • Main concern is python is in a repo with no GPL code.
      • Could delete perl alltogether, but may need to just move perl to different repo for a period of time, until everyone can move off of python.
    • Has cisco found an alternative to perl funclets?
      • Python ini execution is different than perls.
    • Cisco has one perl ini for each branch, and under than 20-30 mpi installs.
      • Probably will go with a template and stamp out 20-30 times

Review Master Master Pull Requests

  • PR for setting VERSION on master Have we broken any VERSIONs
  • Hope to have better Cisco MTT in a week or two

    • Peter is going through, and he found a few failures, which some have been posted.
      • one-sided - nathan's looking at.
      • some more coming.
    • OSC_pt2pt will exclude yourself in a MT run.
      • One of Cisco MTTs runs with env to turn all MPI_Init to MPI_Thread_init (even though single threaded run).
        • Now that osc_pt2pt is ineligible, many tests fail.
        • on Master, this will fix itself 'soon'
        • BLOCKER for v4.0 for this work so we'll have vader and something for osc_pt2pt.
        • Probably an issue on v3.x also.
      • Did this for release branches, Nathan's not sure if on Master. - v4.0.x has RMA capable vader. Once
  • OSHMEM v1.4 - cleanup work

    • How do we look for test coverage of this? Right now just basic API tests.
  • Next Face to Face?

    • When? Double dip with MPI Forum early December. Oct, Nov, 1st week of dec 3.
    • Where? San Jose - Cisco yes and maybe depending on date
      Albuquerque - Sandia (believe it's okay, but need to verify)
    • ACTION: Geoff will Doodle this

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally