[CodeCamp2023-325] Find the proper learning rate #1318

yhna940 · 2023-08-23T09:38:12Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

The primary aim of this pull request is to introduce a tuning methodology for automatically determining the optimal learning rate for model training. Hyperparameter tuning, especially finding the optimal learning rate, is crucial for effective model training. The optimal learning rate serves as a starting point that can significantly reduce the required time and resources in broader hyperparameter space exploration. Given the inherent expensive nature of experiments, adopting a black-box optimization formulation, where the input is the hyperparameter and the output corresponds to model performance, is a strategic choice.

Modification

In this PR, we've integrated a tuning concept that focuses on black-box optimization strategies, such as evolutionary algorithms and Bayesian optimization, to discover the best learning rates. Recognizing the intricate nature of these strategies, instead of implementing from scratch, we've incorporated external libraries like Nevergrad (developed by META), ensuring robustness and efficiency in our search process.

Structure & Roles

Tuner

The Tuner serves as the main orchestrator for the hyperparameter tuning process.

Responsibilities:
- Injects hyperparameters into the runner configuration.
- Initiates the training/evaluation process with the given set of hyperparameters.

Report Hook

This component acts as an intermediary, gathering results from the training and evaluation phases and formatting them for further analysis.

Responsibilities:
- Monitors the training process up to a specified number of tuning iterations or epochs.
- Extracts key performance metrics and scores from the Runner's outputs.
- Reports these results in a standardized format, making them ready for analysis and further decision-making.

Searcher

The Searcher operates in the realm of the hyperparameter space. Using historical data and sophisticated optimization techniques, it suggests the next set of hyperparameters to be evaluated.

Responsibilities:
- Analyzes the history of hyperparameters and their corresponding performance metrics.
- Suggests a suitable candidate point in the hyperparameter space for the next round of training/evaluation.
- Can integrate with external optimization libraries/tools such as Hyperopt, Scikit-optimize, or Microsoft's CFO to make informed recommendations.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

torchrun --nproc_per_node 2 examples/tune/find_lr.py --launcher pytorch

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
The documentation has been modified accordingly, like docstring or example tutorials.

References

mim grid search: https://github.com/open-mmlab/mim/blob/main/mim/commands/gridsearch.py
pytorch lightning tuning: https://lightning.ai/docs/pytorch/stable/_modules/lightning/pytorch/tuner/tuning.html#Tuner
ray tune searcher: https://docs.ray.io/en/latest/tune/api/suggestion.html

TODO

unit tests
docstring

CLAassistant · 2023-08-23T09:38:23Z

All committers have signed the CLA.

HAOCHENYE · 2023-08-30T04:44:02Z

Thank you for your contribution. Your PR message and docstring helped me understand your design. Before we delve into implementation details, let's consider the relationship between the runner and tunner. Could we have the runner manage the ReportHook and Searcher, or perhaps make the Tunner an attribute of the runner? This would provide users with a friendlier experience, enabling automatic hyperparameter discovery in end-to-end training with the Runner.

yhna940 · 2023-08-31T08:53:47Z

Thank you for your contribution. Your PR message and docstring helped me understand your design. Before we delve into implementation details, let's consider the relationship between the runner and tunner. Could we have the runner manage the ReportHook and Searcher, or perhaps make the Tunner an attribute of the runner? This would provide users with a friendlier experience, enabling automatic hyperparameter discovery in end-to-end training with the Runner.

@HAOCHENYE Thank you for your feedback. Your suggestion resonated with me, and I believe you've raised a valid point.

In line with your suggestion, I've added a class method to the Runner class named from_tuning. The overarching idea is to position the Tuner as an auxiliary tool when instantiating the Runner, thus confining the lifespan of the Tuner and enabling the Runner to orchestrate the tuning process.

For example, users can now employ the following streamlined approach:

runner = Runner.from_tuning(
    runner_cfg=runner_cfg,
    hparam_spec={
        'optim_wrapper.optimizer.lr': {
            'type': 'continuous',
            'lower': 1e-5,
            'upper': 1e-3
        }
    },
    monitor='loss',
    rule='less',
    num_trials=16,
    tuning_epoch=2,
    searcher_cfg=dict(type='NevergradSearcher'),
)
runner.train()

This design not only enhances code readability but also provides users with a seamless experience of automatic hyperparameter discovery integrated with end-to-end training using the Runner.

I appreciate your valuable insights and would love to hear any further thoughts or feedback you might have.

HAOCHENYE

Thank you for your contribution! I believe the current solution is reasonable. However, I'm curious if it's possible to integrate the process of calling from_tuning into the Runner.train and have Runner control whether or not to use the Tunner through a parameter. What do you think are the potential risks associated with this approach?

HAOCHENYE · 2023-08-31T09:53:11Z

mmengine/tune/_report_hook.py

+        report_op (str, optional): The operation to report the score.
+            Options are 'latest', 'mean'. Defaults to 'latest'.


We need to describe the meaning of latest, mean here

In line with your suggestion, I have added comments to the meaning and role of report_op, explaining the options like latest and mean.

HAOCHENYE · 2023-08-31T09:54:02Z

mmengine/tune/_report_hook.py

+        max_scoreboard_len (int, optional):
+            The maximum length of the scoreboard.


scoreboard is a new conception for users. We need to introduce it here.

To clarify the newly introduced concept of the scoreboard, I have incorporated additional comments in the relevant section to guide users regarding its purpose and usage.

mmengine/tune/_report_hook.py

HAOCHENYE · 2023-08-31T10:00:10Z

mmengine/tune/_report_hook.py

+
+        tag, _ = runner.log_processor.get_log_after_iter(
+            runner, batch_idx, 'train')
+        score = tag.get(self.monitor, None)


Suggested change

score = tag.get(self.monitor, None)

score = tag.get(self.monitor)

Suggest adding a prefix to monitor like train/loss and val/accuracy. We only check the monitor at specific phase (train or validation) according to its prefix, and raise an error immediately if it is not defined in tag. Reporting an error to users immediately could be better than raising an error after the whole tuning round.

We also need to check the monitored value is a number.

Following your advice, I have enhanced the monitoring process by specifying prefixes to it. Moreover, I've embedded logic to verify that the monitored values are numerical to prevent potential errors.

HAOCHENYE · 2023-08-31T12:45:48Z

mmengine/tune/tuner.py

+        broadcast_object_list(scores_to_broadcast, src=0)
+        score = scores_to_broadcast[0]
+        if is_main_process():
+            self._searcher.record(hparam, score)


Currently, the loss is not synchronized across ranks in mmengine (convenient for watch loss of different ranks), which means that searcher of different ranks could receive different scores and suggest different hparams. Maybe we should apply reduce-mean op to score

I have modified the process to communicate score through a reduce-mean operation, aligning with your observation to enhance the consistency across different ranks.

HAOCHENYE · 2023-08-31T12:52:18Z

mmengine/tune/tuner.py

+        if is_main_process():
+            hparams_to_broadcast = [self._searcher.suggest()]
+        else:
+            hparams_to_broadcast = [None]  # type: ignore


If I'm not mistaken, the reason for only calling suggest within the main process is that there might be randomness in the results from the searcher for the same scores. Maybe we should highlight this point (randomness) in the comment?

You were absolutely right regarding the randomness associated with the suggest method. To provide a clearer picture, I have appended comments highlighting that the method is solely executed in the main process due to the inherent randomness in the outcomes.

HAOCHENYE · 2023-08-31T12:57:05Z

mmengine/tune/tuner.py

+        for k in keys[:-1]:
+            if isinstance(cfg, list):
+                idx = int(k)
+                if idx >= len(cfg) or idx < 0:


A negative index is allowed for list. Is it necessary to raise an error here?

To accommodate the utility of negative indices, I have removed the assertion that previously restricted this, allowing for more flexibility in list operations.

HAOCHENYE · 2023-09-01T06:36:46Z

mmengine/tune/tuner.py

+        self._logger.info(f'Best hyperparameters obtained: {best_hparam}')
+        self._logger.info(f'Best score obtained: {best_score}')
+        self._logger.info('Tuning completed.')
+        return dict(hparam=best_hparam, score=best_score)


In the Runner, several global variables are defined, including Visualizer, Logger, and DefaultScope (which inherits from ManagerMixin). We should clean these variables after each trial. Furthermore, it might be more advantageous to conduct the tuning process within a temporary directory and delete it once it is completed.

Taking cues from your insights, I have incorporated a mechanism to clear global variables following each trial, referencing the testing function for the implementation. Additionally, I have configured the tuning process to transpire within a temporary directory, ensuring its deletion upon completion to maintain a clean workspace.

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

yhna940 · 2023-09-13T08:20:39Z

Thank you for your contribution! I believe the current solution is reasonable. However, I'm curious if it's possible to integrate the process of calling from_tuning into the Runner.train and have Runner control whether or not to use the Tunner through a parameter. What do you think are the potential risks associated with this approach?

Hello, @HAOCHENYE

Thank you very much for your thoughtful suggestion. I appreciate the proactive approach to potentially integrating the from_tuning process directly within the Runner.train method and controlling the utilization of the tuner through a parameter. This could streamline the process, making the tuning more seamless.

However, I think there is a potential risk in combining the two methods. Firstly, at the time of invoking the tuning within the train method, the runner instance has already been instantiated. This means that we will have co-existing instances - the caller runner and the callee runner within the tuner. Both these instances maintain their own models, optimizers, and data loaders, potentially increasing the memory usage considerably.

Furthermore, replacing the attributes of the caller runner with those of the tuned attributes from the post-tuning could introduce considerable complexities. We might find ourselves having to define intricate logic to safely replace the attributes without any adverse side effects. For instance, if we aim to tune the learning rate, we would need to alter the optimizer’s state dict; if we intend to modify the number of data samples, it would require rebuilding the data loader, among other potential modifications. Pre-defining rules for attribute replacement can be a challenging task given the numerous potential scenarios and combinations that would need to be accounted for.

If we are considering integrating tuning within the runner.train, one approach might be to implement a lazy initialization for the runner. This way, the instantiation of the runner is deferred until the train method is invoked, allowing the tuning to complete and the hyperparameters to be decided before the runner instance is created. This could potentially mitigate the concerns mentioned above. However, this approach would entail a significant modification to the runner's operation, and it seems to be a quite extensive and difficult task to undertake in this PR.

I am eager to hear your esteemed opinion on whether my concerns are valid or feedback on this matter.

HAOCHENYE · 2023-09-20T18:31:55Z

I'm very sorry for the delayed response. I think your considerations are very reasonable. MMEngine introduced FlexibleRunner in v0.8.0, which can fully lazy-initiate various components, and it should be able to address the situation where both Tuner and Runner hold two models simultaneously during the training phase. However, this is somewhat unrelated; let's continue to focus on Runner for this PR.

Currently, during initialization, Runner instantiates components like the model, visualizer, and logger. If you want to find the best learning rate during the train, you can still do it similarly to the current approach. You can build a new Runner during train, find the best parameters, and then inject them into the relevant components of the original Runner, rather than directly using the current Runner to search for the best learning rate. However, even with this approach, it doesn't resolve the issue of having two models simultaneously when searching for the learning rate.

For me, both searching for the learning rate during the training phase and searching for it through the from_tuning interface are acceptable. You can implement it according to your preference 😄 .

yhna940 · 2023-09-24T11:23:05Z

I'm very sorry for the delayed response. I think your considerations are very reasonable. MMEngine introduced FlexibleRunner in v0.8.0, which can fully lazy-initiate various components, and it should be able to address the situation where both Tuner and Runner hold two models simultaneously during the training phase. However, this is somewhat unrelated; let's continue to focus on Runner for this PR.

Currently, during initialization, Runner instantiates components like the model, visualizer, and logger. If you want to find the best learning rate during the train, you can still do it similarly to the current approach. You can build a new Runner during train, find the best parameters, and then inject them into the relevant components of the original Runner, rather than directly using the current Runner to search for the best learning rate. However, even with this approach, it doesn't resolve the issue of having two models simultaneously when searching for the learning rate.

For me, both searching for the learning rate during the training phase and searching for it through the from_tuning interface are acceptable. You can implement it according to your preference 😄 .

Hello, @HAOCHENYE ,

First and foremost, I'd like to express my gratitude for your comprehensive feedback on the proposal. Your insights and the clarity with which you've approached the problem have been immensely beneficial.

I am thankful for your dual-pronged approach suggestion, providing both an integration within the runner.train method and using the from_tuning interface for hyperparameter tuning. After pondering over the two alternatives, I find myself gravitating towards the latter, employing the from_tuning method. Several reasons underpin this preference:

Separation of Concerns: Leveraging the from_tuning method inherently segregates the tuning phase from the training phase. This explicit demarcation ensures that each phase has its specific focus, leading to more structured and understandable code.
Avoidance of Co-existing Attributes: As you rightly pointed out, having the caller runner and the callee runner simultaneously poses memory and complexity concerns. With the from_tuning approach, this co-existence is avoided, leading to a more memory-efficient and streamlined workflow.

I sincerely hope my approach aligns well with the vision of MMEngine. I'd appreciate further comments, reviews, or feedback on this direction or any other aspect of the PR to refine and improve it.

Thank you once again for your guidance.

yhna940 added 17 commits August 10, 2023 16:23

Init tuner for finding best lr

67e9dc2

Apply lint

b714913

Add ex for tuning

a923847

Refactor to rpc

4c9ef09

Apply lint

3580dd8

Add logger to tune

55364e0

Fix searcher init args

882271a

Apply lint

6285928

Fix typo

0431eb0

Fix minor

b5985fb

Fix rpc init

4b5a249

Fix env for rpc

6846aba

fix rpc device map

a320ee1

Del rpc

ccb8f07

Fix examples

bae8605

Fix minor

18fd768

Fix typo

71b4b2a

zhouzaida linked an issue Aug 24, 2023 that may be closed by this pull request

[Feature] find the proper learning rate #1150

Open

yhna940 added 10 commits August 28, 2023 11:21

Split seachers

fecfacb

Comment the tuner

010a3f1

Rename solver of nevergrad

69e62a7

Comment the report hook

23d1f97

Comment the searchers

482a9e5

Add readme for tune

04b46a3

Add error logging

3418ddc

Add unittest for tune

92ad439

Apply lint

308ece3

Add random searcher

0767f52

yhna940 added 4 commits August 30, 2023 16:48

Fix unittest bug

70d91e4

Fix tuner unittest

1e12211

Add tuning interface for runner

c4a7e04

Fix minor

4d71002

yhna940 added 2 commits September 1, 2023 11:37

Refactor report op

cfc3f6a

Fix report bug

3488ae1

yhna940 marked this pull request as ready for review September 1, 2023 04:29

yhna940 requested review from zhouzaida, HAOCHENYE and RangiLyu as code owners September 1, 2023 04:29

HAOCHENYE reviewed Sep 1, 2023

View reviewed changes

yhna940 and others added 9 commits September 9, 2023 11:12

Update mmengine/tune/_report_hook.py

c0d8e45

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

Merge branch 'open-mmlab:main' into feature/hyper-naive

62d6777

Fix comment for report hook

27bb08b

Specify phase in monitor

afb5af2

Fix comment on tuner for monitor

0cdf020

Apply reduce operation to score in trial

48e2abc

Fix comment on tuner

eb6b387

Enhance safe trial during tune

16d5186

Fix unittest bug

8f5ee32

yhna940 requested a review from HAOCHENYE September 13, 2023 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeCamp2023-325] Find the proper learning rate #1318

[CodeCamp2023-325] Find the proper learning rate #1318

yhna940 commented Aug 23, 2023 •

edited

CLAassistant commented Aug 23, 2023 •

edited

HAOCHENYE commented Aug 30, 2023

yhna940 commented Aug 31, 2023

HAOCHENYE left a comment

HAOCHENYE Aug 31, 2023

yhna940 Sep 13, 2023

HAOCHENYE Aug 31, 2023

yhna940 Sep 13, 2023

HAOCHENYE Aug 31, 2023

HAOCHENYE Aug 31, 2023

HAOCHENYE Aug 31, 2023

yhna940 Sep 13, 2023

HAOCHENYE Aug 31, 2023

yhna940 Sep 13, 2023

HAOCHENYE Aug 31, 2023

yhna940 Sep 13, 2023

HAOCHENYE Aug 31, 2023

yhna940 Sep 13, 2023

HAOCHENYE Sep 1, 2023

yhna940 Sep 13, 2023

yhna940 commented Sep 13, 2023

HAOCHENYE commented Sep 20, 2023

yhna940 commented Sep 24, 2023

		report_op (str, optional): The operation to report the score.
		Options are 'latest', 'mean'. Defaults to 'latest'.

		max_scoreboard_len (int, optional):
		The maximum length of the scoreboard.

	score = tag.get(self.monitor, None)
	score = tag.get(self.monitor)

[CodeCamp2023-325] Find the proper learning rate #1318

Are you sure you want to change the base?

[CodeCamp2023-325] Find the proper learning rate #1318

Conversation

yhna940 commented Aug 23, 2023 • edited

Motivation

Modification

Structure & Roles

Tuner

Report Hook

Searcher

BC-breaking (Optional)

Use cases (Optional)

Checklist

References

TODO

CLAassistant commented Aug 23, 2023 • edited

HAOCHENYE commented Aug 30, 2023

yhna940 commented Aug 31, 2023

HAOCHENYE left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhna940 commented Sep 13, 2023

HAOCHENYE commented Sep 20, 2023

yhna940 commented Sep 24, 2023

yhna940 commented Aug 23, 2023 •

edited

CLAassistant commented Aug 23, 2023 •

edited