Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExpertGate Nan loss using multi-head classifier #1639

Open
appledora opened this issue Apr 19, 2024 · 4 comments
Open

ExpertGate Nan loss using multi-head classifier #1639

appledora opened this issue Apr 19, 2024 · 4 comments

Comments

@appledora
Copy link

appledora commented Apr 19, 2024

In my current experiments, I am trying to set up a different dataset ( with different number of classes) as separate tasks/experiences. While trying to train on ExpertGate, I am not sure if I am doing the data processing correctly as I am getting Nan loss repeatedly.

Here's my dataprocessing code:

for idx, (task_name, num_classes) in enumerate(zip(task_list, classes_per_task)):
    print("task: ", task_name)
    _, _, _, train_dataset, test_dataset = get_dataloaders(task_name, 0.8, 16)
    train_avalanche_dataset = _make_taskaware_classification_dataset(train_dataset)
   
    test_avalanche_dataset = _make_taskaware_classification_dataset(test_dataset)
    train_dataset_list.append(train_avalanche_dataset)
    test_dataset_list.append(test_avalanche_dataset)

ncbm = nc_benchmark(
        train_dataset_list,
        test_dataset_list,
        n_experiences = 100,
        task_labels=True,
        shuffle=False,
        class_ids_from_zero_in_each_exp=True,
        one_dataset_per_exp=True,
        train_transform=None,
        eval_transform=None,
    )

This is how I am initializing the model and strategy:

model = ExpertGate(shape=(3, 224, 224),device=device)
model.expert.classifier[6] = MultiHeadClassifier(4096)
cl_strategy =  ExpertGateStrategy(model = model, optimizer = optimizer,
                train_mb_size=256,
                eval_mb_size=128,
                train_epochs=2, 
                ae_lr=1e-3,
                device = device
                )

Here's a sample (on-going) training log:

Device:  cuda:0
Starting experiment...
Start training on experience  0
-- >> Start of training phase << --

TRAINING NEW AUTOENCODER
-- >> Start of training phase << --
15106it [09:10, 27.44it/s]                          
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 135.1136
100%|██████████| 5241/5241 [00:55<00:00, 95.15it/s] 
Epoch 1 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 124.1420
-- >> End of training phase << --
FINISHED TRAINING NEW AUTOENCODER


SELECTING EXPERT
FINISHED EXPERT SELECTION


TRAINING EXPERT
100%|██████████| 21/21 [00:34<00:00,  1.65s/it]
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = nan
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.0130
100%|██████████| 21/21 [00:34<00:00,  1.64s/it]
Epoch 1 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = nan
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.0036
-- >> End of training phase << --
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|██████████| 11/11 [00:09<00:00,  1.21it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task000/Exp000 = nan
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp000 = 0.0061
-- >> End of eval phase << --
	Loss_Stream/eval_phase/test_stream/Task000 = nan
	Top1_Acc_Stream/eval_phase/test_stream/Task000 = 0.0061
End training on experience  0
Training time: 191.16707921028137
Computing accuracy on the test set
Start training on experience  1
-- >> Start of training phase << --

TRAINING NEW AUTOENCODER
-- >> Start of training phase << --
 30%|███       | 3756/12490 [01:01<02:16, 64.00it/s]
1

1

Would love to hear any pointers on this?
In general, what is the best way to set up dataloaders for my particular setting?

@appledora
Copy link
Author

btw, I wanted to initially ask on your slack - but the invite link is no longer working.

@niniack
Copy link
Contributor

niniack commented Apr 20, 2024

Heya, happy to try and help you out with this. I wrote this strategy for Avalanche a while ago. I'd like to make sure it's usable for you. Hopefully we can squash any bugs, if there is something in the implementation. I might be a bit slow in response, so please bear with me.

It seems like you're using a custom dataset, were you able to repro this issue with a non-custom dataset? If so, could you please share a minimal example. I want to figure out whether this is a root issue with the implementation or whether there is something specific with your use case. In your current setup, what optimizer & lr are you using?

@appledora appledora changed the title EpertGate Nan loss using multi-head classifier ExpertGate Nan loss using multi-head classifier Apr 21, 2024
@appledora
Copy link
Author

appledora commented Apr 21, 2024

Hey, @niniack . Thanks for your contributions.
So, I haven't tried using the default datasets as they don't particularly fit my current objective of using each dataset as a separate task. The custom get_dataloader function that I have used here simply returns Pytorch dataloader objects and datasets classes.
Here's the base Dataset class I am using:

class ImageDataset(Dataset):
    def __init__(self):
        self.data_path = ""
        self.data_name = ""
        self.num_classes = 0
        self.train_transform = None
        self.train_csv_path = ""
        self.image_paths = []
        self.labels = []

    def get_num_classes(self):
        return self.num_classes
    def __getitem__(self, index):
        img_path = self.image_paths[index]
        label = self.labels[index]
        img = Image.open(img_path).convert("RGB")
        if self.train_transform:
            img = self.train_transform(img)
            
        return img, label
    def __len__(self):
        return len(self.image_paths)
    
    @property
    def label_dict(self):
        return {i: self.class_map[i] for i in range(self.num_classes)}
    
    def __repr__(self):
        return f"ImageDataset({self.data_name}) with {self.__len__} instances"
def get_dataloader(dataset) : 
    split_size = int(0.8 * len(dataset))
    train_dataset, val_dataset = torch.utils.data.random_split(dataset, [split_size, len(dataset) - split_size])

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True,  pin_memory=True)
    val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=True,  pin_memory=True)

    return train_loader, val_loader, dataset.get_num_classes()
    ```

@appledora
Copy link
Author

appledora commented Apr 21, 2024

This is the optimizer I am using. I picked the hyperparameters from the continual-learning-baseline repo :

model = ExpertGate(shape=(3, 224, 224),device=device)
optimizer = SGD(
        model.expert.parameters(), lr=0.1, momentum=0.9, weight_decay=0.0005
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants