[20220323] NAS Roadmap 2022

This document is to track current status and work items of NAS.

Current version: v2.6 (v2.7 pending release).

Planned Architecture Diagram

Milestones

The first end-to-end example: DARTS algo on DARTS search space: https://github.com/microsoft/nni/pull/4509 (blocked by one-shot refactoring)
Showcase of a slightly more complex case, demonstrating SOTA results.

Breakdowns

Unfinished work items till first milestone:

One-shot refactor merges back to master (strategy refactor, value-choice support): pending PR https://github.com/microsoft/nni/pull/4602
DARTS search space in search space hub: PR not ready https://github.com/microsoft/nni/pull/4524
Might be more unseen bug fixes and feature improvements.

Till second milestone:

Need discussion to finalize a concrete "complex case".
Might be more unseen bug fixes and enhancements.

Backlog

Constructing model space
- Space hub
  - APIs on retrieving searched results directly
  - Test already implemented spaces (we will have a space hub reproducibility list)
    - Make sure they are runnable
    - Load checkpoint of searched architecture and evaluate
    - Reproduce re-training
    - Runnable with built-in algos
    - Reproduce result with at least one algo
      - Pending work item: integrating training service of Microsoft internal clusters
    - (if a benchmark search space) test with benchmark
  - Incorporating spaces featuring NLP and speech tasks.
- Mutation primitives
  - More APIs, e.g., Permute, ValueRange
  - A higher-level API to unify the usages. For example, oneof() to unify XXXChoice
  - Primitive mutators are too messy, need refactor
- User experience
  - Use value-choice on base model's arguments
Evaluator
- More built-in evaluators for cases like self-supervision, object detection (depending on space hub)
- More fine-grained control for logs, checkpoints, visualizations for Lightning-based evaluators (details yet to be discovered)
Strategy
- Comprehensive supports of mutation primitives in one-shot algos (pending: Repeat, Cell) (issues: https://github.com/microsoft/nni/issues/4294 https://github.com/microsoft/nni/issues/4671)
- SOTA NAS algorithms
  - Performance predictor
  - Early-stop
  - Advanced one-shots like OFA
- Support more parameters types other than choice
- Chain two stage algorithms (e.g., SPOS)
- A non-ad-hoc support for latency filters (or other filters)
- Bug fixes
  - Budget is not sync with experiment budget https://github.com/microsoft/nni/issues/4421
  - RL is known to have robustness issues in its multi-threading implementation https://github.com/microsoft/nni/issues/4421
Engine
- Refactor model-IR converter into "base" execution engine.
Experiment
- Interface refactor (unify with HPO experiment)
- Bug fixes
  - Process won't stop without start()
  - Sometimes fails with OS Error 9
  - Sometimes needs twice Ctrl+C to kill the experiment
- Export model enhancements
  - During run: https://github.com/microsoft/nni/issues/4257
- Visualization
  - More friendly tips when the model can't be visualized
  - Visualizing one-shot experiments
Others
- Serializer
  - Known issue with inheritance
  - Known issue with __new__

Some features which I think is not important at all in the current stage is not listed in the backlog. For example, Cross-graph optimization, Tensorflow support, unifying mutation APIs between one-shot and multi-trial.

This wiki is a journal that tracks the development of NNI. It's not guaranteed to be up-to-date. Read NNI documentation for latest information: https://nni.readthedocs.io/en/latest/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20220323] NAS Roadmap 2022

Planned Architecture Diagram

Milestones

Breakdowns

Backlog

Clone this wiki locally