Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the MLflow logger produces Inconsistent metric plots #19874

Open
gboeer opened this issue May 16, 2024 · 2 comments
Open

Using the MLflow logger produces Inconsistent metric plots #19874

gboeer opened this issue May 16, 2024 · 2 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers

Comments

@gboeer
Copy link

gboeer commented May 16, 2024

Bug description

When using the MLFlowLogger I have noticed that in some cases, the produced plots in the overview Model metrics section of the MLflow web app are messed up.
Additionally, the plots for the same metric, when viewed in the detail view are displayed correctly, hence are very different from the plots in the overview tab.

I am not absolutely sure, but I think this may have to do with how the step parameter is propagated to MLflow or how the global_step is calculated.
In my current experiment I use a large training set, and a smaller validation set and I have set the Trainer to log_every_n_steps=20.
For the training step this seems to work fine (the plots all look good), but I guess that during the validation step this log step size is larger than the total amount of batches per validation step. If so however, I still wonder why the plots in the detailed view of the validation metrics all look fine, but only the plots in the Model metrics overview are messed up.

During the validation step I tried using the normal lightning self.log, as well as the self.logger.log_metrics the self.logger.experiment.log_metric and the direct api mlflow.log_metric , all which lead to similar results (though also not the same produced plots).

def validation_step(self, batch, batch_idx):
        inputs, labels, _ = batch
        outputs = self.model(inputs)
        loss = self.val_criterion(outputs, labels)
        _, predictions = torch.max(outputs, 1)
        val_accuracy = torch.sum(predictions == labels.data).double() / labels.size(0)

        self.log("val_accuracy", val_accuracy)
        self.logger.log_metrics({"logger_val_accuracy": val_accuracy}, step=self.global_step)
        self.logger.experiment.log_metric(key="logger_experiment_val_accuracy", value=val_accuracy, step=self.global_step, run_id=self.logger.run_id)

See the following images which illustrate the plots for each of those calls:

image
image
image

For comparison, the log for the detailed metric view from the same experiment

image

I would like to point out that I don't see this behavior with other experiments, usually with smaller sized datasets where I also used smaller log_every_n_steps and that yet I have not been able to reproduce this issue with those smaller setups.

Edit: Another side note, I also use the same metric val_accuracy (the one I log with the simple self.log()) as monitor for the ModelCheckpoint which also works as expected. So internally the metric is calculated and handled correctly, and the detailed metric plot also reveals this. Only the overview pane for all metrics for some reason shows this strange behavior.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

No response

Error messages and logs

No response

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

@gboeer gboeer added bug Something isn't working needs triage Waiting to be triaged by maintainers labels May 16, 2024
@Antoine101
Copy link

Antoine101 commented May 22, 2024

Hi @gboeer

I am "happy" to see I am not the only one having issues logging with MLFlow.

I am finetuning a pretrained transformer model on 2000ish images. So not an insane amount of data.

Here is what am I seeing:
image

As you can see, metrics such as validation_accuracy although recorded on_step=False, on_epoch=True only always show me the value of the last epoch. I would like to see an actual graph with all my previous epochs, it's just a scalar here.

Also, I tell my trainer to log every 50 steps, but in my epochs-step plot I see points at the following steps only: 49, 199, 349, 499, ... not every 50.

Here is my logger:

logger = MLFlowLogger(
            experiment_name=config['logger']['experiment_name'], 
            tracking_uri=config['logger']['tracking_uri'],
            log_model=config['logger']['log_model']
        )

Passed to my trainer:

trainer = Trainer(
    accelerator=config['accelerator'],
    devices=config['devices'],
    max_epochs=config['max_epochs'],
    logger=logger,
    log_every_n_steps=50,
    callbacks=[early_stopping, lr_monitor, checkpoint, progress_bar],
)

My metrics are logged in the following way in the training_step and validation_step functions:

def training_step(self, batch, batch_idx): 
    index, audio_name, targets, inputs = batch
    logits = self.model(inputs) 
    loss = self.loss(logits, targets)
    predictions = torch.argmax(logits, dim=1)
    self.train_accuracy.update(predictions, targets)
    self.log("training_loss", loss, on_step=True, on_epoch=True, batch_size=self.hparams.batch_size, prog_bar=True)
    self.log("training_accuracy", self.train_accuracy, on_step=False, on_epoch=True, batch_size=self.hparams.batch_size)
    self.log("training_gpu_allocation", torch.cuda.memory_allocated(), on_step=True, on_epoch=False)        
    return {"inputs":inputs, "targets":targets, "predictions":predictions, "loss":loss}

        
def validation_step(self, batch, batch_idx):
    index, audio_name, targets, inputs = batch
    logits = self.model(inputs)
    loss = self.loss(logits, targets)
    predictions = torch.argmax(logits, dim=1)
    self.validation_accuracy(predictions, targets)
    self.validation_precision(predictions, targets)
    self.validation_recall(predictions, targets)
    self.validation_f1_score(predictions, targets)
    self.validation_confmat.update(predictions, targets)
    self.log("validation_loss", loss, on_step=True, on_epoch=True, batch_size=self.hparams.batch_size, prog_bar=True)
    self.log("validation_accuracy", self.validation_accuracy, on_step=False, on_epoch=True, batch_size=self.hparams.batch_size)
    self.log("validation_precision", self.validation_precision, on_step=False, on_epoch=True, batch_size=self.hparams.batch_size)
    self.log("validation_recall", self.validation_recall, on_step=False, on_epoch=True, batch_size=self.hparams.batch_size)
    self.log("validation_f1_score", self.validation_f1_score, on_step=False, on_epoch=True, batch_size=self.hparams.batch_size)

I guess it's a problem from lightning but not 100% sure.

I hope we'll get suppot soon. I serve my ML models on MLFlow and it works fine, so I don't want to go back to tensorboard for my DL models only.

EDIT: My bad, it seems to do that just when the training is still on. When the training is finished, the plots display correctly.
image

But still, I thought we were supposed to be able to follow the evolution of metrics as training progresses, and in this case it's not very possible.

@gboeer
Copy link
Author

gboeer commented May 23, 2024

@Antoine101
Interesting, that your plots change after the training is finished. For me, they stay the same, though. I tried opening the app in private window to see if there are any caching issues, but it didn't change anything.

I guess what you observed about the stepsize may just have to do with zero-indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants