Skip to content

WeeklyTelcon_20171219

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Brian
  • Howard Pritchard
  • Josh Hursey
  • Ralph
  • Todd Kordenbrock
  • Nathan Hjelm
  • Thomas Naughton

Agenda

  • Jan 9th:
    • Decided last week to push date to late feb or march.
    • Discuss abandoning openib btl.
      • Want Chelcio and nvidia to be part of discussion.
    • Test infrastructure
      • Some reliability issues for various jenkins and MTT
      • figure out how to deal with on larger context.
      • Not sure what to do if someone's jenkins fails your PR.

Review v2.0.x Milestones v2.0.4

  • Nothing New, nothing forcing a new release.

Review v2.x Milestones v2.1.2

  • A few PRs coming through.
  • Bugfix only mode.
  • Launchmond / Alliena attach mode - Issue 3660. This mechanism is part of MPIR,
    • Ralph fixed in PR4630 - fix the debugger problem.
    • Howard wants Jeff to review this change.
  • Need PR4399 only in v2.x branch.
    • Need to have "nice" commit message, good enough
    • Now can fix arm tests.
  • Schedule: January release.

Review v3.0.x Milestones v3.0

  • Schedule: Get v3.0.1 out by end of the week.
  • Duped issue: Mpool init hang AND Current blocker: Hang on ARM in v3.0.x
    • Only hangs in debug. Bad, but not ship-stopper.
    • Doesn't happen in optimized mode
    • Issue 4563 - not seeing on little arm boxes here, Jenkins uses --disable-builtin-atomics.
      • Because when we disable atomics on powerpc, compiler thinks we have cmp-set128.
      • On arm uses old-school lock-based lifo and fifo.
    • Fix being worked in PR3988 - bug in PGI compiler
  • Issue 4509 madvise hook
    • Jeff and Howard will discuss.
    • Now that we hook madvise, we need to be more careful.
    • Nathan hopes his PR 4576 on master would reduce the occurances to 0, but need user to verify.
      • may have to invalidate a LARGE region, even though it's mostly valide just because glibc invalideded a small part of it.
    • Tested PR 4576 in master last week,
      • Still need to merge into v2.x, v3.0.x and v3.1.x
  • Do we need to Pull PR 4628 into v3.0.x?
    • broken in v3.0.0 and later, but it's just launch performance not hang.
    • decided NOT to block v3.0.1 for this, and fix this in v3.0.2

Review v3.1.x Milestones v3.1

  • SCHEDULE: Like to get out in late January
  • For v3.1.x blockers, please insure they have both "target_x" label, and "blocker" label.
  • v3.1.0 still has Blocker Issue 4509
    • Hope it was fixed in PR 4576 in master tonight, to merge in later.
    • Assuming this was fixed (customer didn't reproduce yet)
  • Dist Graph Create / Tree Create is still segfaulting - but others can't reproduce.
    • happens spurradically.
    • Issue 4303
    • maybe turn it off by default?
    • Component in topo, creates graphs when you create a communicator.
    • If you get a reproducer, then update ticket and hand to George.
      • Ralph will try to see if he can give George access.

Review Master Master Pull Requests

  • rcache GRDM is hitting an assert in Finalize (refcount on object).
    • Nathan will look at.

MTT / Jenkins Testing Dev

  • Seems to be a memory leak in the OMPI Jenkins
    • Working on a solution
    • Workaround by turnning off pipeline builds.


This week Discussion Points.

  • Brian sent an email earlier this week about News file
    • Either we make merging painful for developers, or we create a rather large amount of work for release managers.
    • Can automate via Pull Request that ends up in the merge.
    • block NEWS: whatever you want NEWS to be.
    • With metadata, using Pull Requests, then can change that NEWS block after the fact.
    • Would happen at make dist time. Public API calls.
  • WebEx Schedule: WebEx Next Tuesday Dec19 (unless 0% chance of getting v3.0.1 out)
    • Cancel Dec 26,
    • Cancel Jan 2nd
  • Jeff will create new WebEx URL for 2018.

Oldest PR

Oldest Issue

Next face-to-face meeting

  • See on list email
    • Decided last week to push date to late feb or march.

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally