Skip to content

[20200514] NNI Modules

Yuge Zhang edited this page Mar 23, 2022 · 2 revisions

This page lists all parts of NNI, and documents the role of each part.

Last update: 2020-05-14 (v1.5)

Module List

In coarse grain, NNI has following modules:

  • NNI manager serves as the bus among other components
  • Python SDK provides API to end-users; notably dispatcher is the underlying process of tuner/assessor/advisor
  • Training services provide uniform interface on top of various platforms (e.g. SSH-connected servers, OpenPAI)
  • Web UI
  • Built-in algorithms
  • The launcher, nnictl
  • Other utility stuffs
  • Installation/deployment scripts and CI pipelines
  • Examples and documentations

In HPO scenario, our user starts using NNI by calling nnictl command, which loads all user files, launchs NNI manager and web UI separately, and exits when these are done. Then NNI manager reads the configuration feeded by nnictl, finds which tuner to use, and launchs dispatcher with that tuner. At the same time, a training service is picked and initialized following user configuration. Since training service is a TypeScript class running in same process with NNI manager, the initializing progress is implicitly invoked. Now initializing steps are finnished. NNI manager starts to work in a event-driven manner. When the dispatcher requests a trial, NNI manager submits a job to training service; and when training service detects a new metric, NNI manager sends it to dispatcher. Web UI runs in a backend web server waiting for user's requests, and communicates with NNI manager through NNI manager's RESTful API.

NAS and Model Compression are generally not distributed at this moment so they are standalone Python library and do not use NNI manager. However some algorithms (e.g. SPOS) may work in a HPO-like way and from NNI framework's perspective, they ARE HPO advisors. (An advisor is a highly customized tuner and/or assessor)

(NOT SURE) NAS algorithms can optionally save architecture information to a pre-set path, and NAS UI will show those data on web.

(MISSING) I'm not familiar with feature engineering.

NNI Manager

NNI manager is TypeScript module. Following source files make up NNI manager:

  • src/nni_manager/core/
  • src/nni_manager/common/ (TypeScript interface definition)
  • src/nni_manager/rest_server/
  • src/nni_manager/main.ts (entry point and initialization)

NNI manager can be divided to following parts:

  • The main business logic (core/nnimanager.ts)
  • A sqlite database for persistence (core/nniDataStore.ts and core/sqlDatabase.ts)
  • IPC interface for dispatcher (core/ipcInterface.ts)
    • This interface runs on anonymous pipelines, fd 3 and fd 4
    • (MISSING) link to interface doc
  • IPC/RPC interface for nnictl and web UI (rest_server/)
    • This is a HTTP server basing on express
    • Yes nnictl and web UI share the same interface
    • (MISSING) link to interface doc

Training Services

Each training service is a TypeScript class. Their common interface is defined in src/nni_manager/common/trainingService.ts, and each service is implemented in a folder inside src/nni_manager/training_service.

Training services are extremely specific when come to detail. So read their own documentation. There's nothing here.

(NOTE) Strangely some training services (e.g. k8s) did NOT inherit the common interface. I think it's bad.

Web UI

Web UI codes locate in src/webui. This is a somewhat standalone module. It is launched seperately and only communicates with NNI manager through RESTful API.

NNI web UI uses React framework and Fabric UI components.

Python SDK

SDK source directory has been messed up. We need to refactor the hierarchy.

The directory contains source files for HPO dispatcher, HPO trial API, NAS API, Model Compression API, and built-in algorithms.

The major source files of dispatcher are:

  • src/sdk/pynni/nni/__main__.py (entry of dispatcher process)
  • src/sdk/pynni/nni/msg_dispatcher_base.py
  • src/sdk/pynni/nni/msg_dispatcher.py
  • src/sdk/pynni/nni/tuner.py (event-driven tuner API built on dispatcher)
  • src/sdk/pynni/nni/assessor.py (event-driven assessor API built on dispatcher)
  • src/sdk/pynni/nni/protocol.py (IPC interface with NNI manager)

Source files for trial API are:

  • src/sdk/pynni/nni/trial.py (basic trial API)
  • src/sdk/pynni/nni/smartparam.py (non-strict API shared by most HPO tuners)
    • The reason it exists is that annotation module (see utility section of this page) can parse these API calls in user code to generate search space
  • src/sdk/pynni/nni/platform/local.py (IPC interface with all training services, despite its name)

NAS files locate in src/sdk/pynni/nni/nas and model compression files locate in src/sdk/pynni/nni/compression.

At last, NNI Python SDK shares a uniform log format, which is set up in src/sdk/pynni/nni/common.py. However each module has its unique function to invoke the log initializer, use your IDE or grep to find them.

Built-in Algorithms

NNI has HPO, NAS, and model compression algorithms built in.

HPO algorithms can be found in src/sdk/pynni/nni/ directory. They are named ***_tuner, ***_assessor, or ***_advisor.

NAS alogrithms can be found in src/sdk/pynni/nni/nas/<FRAMEWORK>/.

Model compression algorithms can be found in src/sdk/pynni/nni/compression/<FRAMEWORK>/ folder. Pytorch algorithms are placed in pruners.py and quantizers.py. Deprecated TensorFlow v1.x alogrithms are in builtin_pruners.py and builtin_quantizers.py.

(MISSING) Feature engineering

NNICTL

TODO

Utilities

TODO

  • Trial keeper
  • GPU metrics collector
  • Annotation