Skip to content

EnTK Roadmap, 2020

Hyungro Lee edited this page Jul 7, 2020 · 5 revisions

Agenda

  • Known issues
    • open tickets
  • Features/enhancements/performance
    • RabbitMQ overhead
  • User support
    • Deployment
    • error handling
    • analytics
  • Testing

Milestones

TODO - Issues

  • 45 open tickets
    • 16 bugs
      • 1month:
        • failure to submit job on comet_ssh #419
        • more than 68 cpus cannot be utilized on stampede2 #416
      • 3months:
        • pytest never finishes when running locally #436
        • Key tag not in schema #429 Ioannis
      • 6months:
        • delay between stages #439
        • EnTK forks which breaks the RU ZMQ layer #428
        • unexpected keyboardinterrupt messages #426
        • deadlock during rp.umgr creation #421, EnTK deadlock on non-trivial pipeline counts. Replica Exchange #410
        • _ special character is not allowed in names of Pipeline, Stage, and Task; failing data sharing #392
        • do not pollute user dir #380
      • parked issues #333, #321, #255
    • 9 features/enhancement
      • 1 month
        • purge radical.ensemblemd from pypi #430
        • missing type check on RMQ port (and probably other parameters) #362
      • 3 months
        • AppManager non-blocking run and callbacks #405
      • 6 months
        • configuring RabbitMQ as a docker image #414
        • move heartbeat into its own thread (task manager) #327
      • parked #320, #269, #243, #212
    • 6 documentation

Test

  • travis/unit tests/coverage

Use Cases (random order)

  • HPC Workflow (Princeton, PSU)
  • Extasy (Rice)
  • NAMD-EnTK (ASU)
  • COVID-19 (ANL)
  • DeepDriveMD (ANL)
  • Harel (Cornell)
  • INSPIRE (UCL)
  • FACTS (Rutgers)

Platform Support

  • Lassen at LLNL
  • SuperMUC at UCL
  • Longhorn at TACC
  • Theta at ANL
  • and so on

Performance

  • data staging
  • multiple pipelines e.g. 5k