Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues introduced by refactoring and other miscellaneous #14

Open
molereddy opened this issue Mar 16, 2024 · 9 comments
Open

Issues introduced by refactoring and other miscellaneous #14

molereddy opened this issue Mar 16, 2024 · 9 comments

Comments

@molereddy
Copy link

molereddy commented Mar 16, 2024

Wanted to inform about some issues which cam up after merging with the latest code.

  1. You need to add natsort to requirements.txt (needed after the refactor)
  2. For forgetting command in the README, need to just have python forget.py --config-name=forget.yaml or provide the parameters like split, model and lr.
@molereddy
Copy link
Author

molereddy commented Mar 16, 2024

Also in several parts of CustomTrainerForgetting's evaluate(), self.accelerator is called without wrapping in a if self.is_deepspeed_enabled.

I don't understand deepspeed usage and haven't been able to get deepspeed running on my machine, so this might not be an actual issue.

@molereddy
Copy link
Author

molereddy commented Mar 16, 2024

Another issue introduced during the refactor: in dataloader.py, The case of world_size==1 is not handled,aggregated_eval_logs remains empty when world size is 1.
Leading to an empty log dump and then an error while extracting forget quality.
File "/tofu/dataloader.py", line 297, in evaluate forget_quality = get_forget_quality(aggregated_eval_logs, retain_result) File "/tofu/utils.py", line 144, in get_forget_quality unlearn_forget_result = unlearn_result['eval_log_forget.json'] KeyError: 'eval_log_forget.json'

Things were good back when this line wasn't commented out
aggregated_eval_logs = interleave_eval_result_dict(aggregated_eval_logs, forget_rate, large_bsz=eval_cfg.batch_size, num_processes=world_size)

world_size==1 can be put in an else block

@molereddy molereddy changed the title Minor fixes Issues introduced by refactoring and other miscellaneous Mar 16, 2024
@molereddy
Copy link
Author

molereddy commented Mar 18, 2024

In forget.py, local_rank is introduced in line 44 under an if os.environ.get('LOCAL_RANK') is not None: block.

It is then referenced outside any such block in line 182 if local_rank == 0:

@zhilif
Copy link
Collaborator

zhilif commented Mar 18, 2024

@molereddy Thank you so much for the reports! We are working on the fix now. I have some comments here:

  1. python forget.py --config-name=forget.yaml is just to let hydra know that you use the forget.yaml config. You can use both forget.yaml and extra args to overwrite the default values in forget.yaml.
  2. deepspeed is just the way we use to shard, frankly we have not tried other sharding frameworks. What type of sharding do you usually use? We could add the support.
  3. We almost abandon the evaluate() method in dataloader.py, since we find that HF trainer execute evaluation much slower than a direct call. We haven't find the root cause yet, but we for now just call python evaluate_util.py
    Let us know if you have other questions!

@molereddy
Copy link
Author

I only know of deepspeed as well. It's just it hasn't been straighforward to get deepspeed setup on the machines I have available. I am currently putting that issue on the backburner, I will report again with any errors in that aspect.

@molereddy
Copy link
Author

Does the recent adding of index to the dataset object change anything about the evaluation results or is it just to make debugging easy or something?

@zhilif
Copy link
Collaborator

zhilif commented Mar 29, 2024

Does the recent adding of index to the dataset object change anything about the evaluation results or is it just to make debugging easy or something?

It should make everything easier. Previously the interleaving logic only works for a very specific batch size and num_gpus, and we have a bug which causes some data duplication. The error shouldn't change anything from a statistical point of view, but it may alter the numbers. I would stay away from interleaving from now on.

@molereddy
Copy link
Author

molereddy commented Apr 24, 2024

I understand it is currently deprecated for being slow, but I think the evaluate() method being updated would be very helpful.

  1. We almost abandon the evaluate() method in dataloader.py, since we find that HF trainer execute evaluation much slower than a direct call. We haven't find the root cause yet, but we for now just call python evaluate_util.py

The evaluate() in dataloader.py is quite convenient to use as you can get results for each epoch directly alongside training instead of saving the checkpoints and evaluating each epoch after training. This is helpful when we run many experiments and don't have space to store all the checkpoints before evaluating - it is more convenient to eval the model directly instead of storing all checkpoints and evaluating.

I might have missed it if this functionality of evaluating is existing already..

I think this fix would take care of the world_size > 1 bug in dataloader's evaluate()

                else:
                    eval_logs = json.load(open(os.path.join(curr_save_dir, f"{eval_task}.json")))
                    aggregated_eval_logs[f'{eval_task}.json'] = eval_logs
                    new_save_filename = os.path.join(curr_save_dir, f"{eval_task}.json")
                    with open(new_save_filename, "w") as f:
                        # pretty write json to f
                        json.dump(eval_logs, f, indent=4)

@molereddy
Copy link
Author

Also a reminder to fix the local_rank forget.py bug mentioned in the earlier comment #14 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants