Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2 QUESTION]: SpectralFNN functionality #829

Open
Rhys-McAlister opened this issue Apr 23, 2024 · 5 comments
Open

[v2 QUESTION]: SpectralFNN functionality #829

Rhys-McAlister opened this issue Apr 23, 2024 · 5 comments
Labels
question Further information is requested
Milestone

Comments

@Rhys-McAlister
Copy link

How does the SpectralFNN predictor model function? I'm not sure how to pass the correct dimensions to this module:

UserWarning: Using a target size (torch.Size([64, 1801])) that is different to the input size (torch.Size([64, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(preds, targets, reduction="none")

I can see that there is an n_targets parameter but I can't access this without just changing the file

@Rhys-McAlister Rhys-McAlister added the question Further information is requested label Apr 23, 2024
@kevingreenman kevingreenman added this to the v2.0.1 milestone Apr 23, 2024
@am2145
Copy link
Contributor

am2145 commented Apr 23, 2024

We don't have full support for spectral predictions at the moment, but you can access the n_tasks parameter when initializing the FFN which should get the dimensions correct. For example: ffn = nn.SpectralFFN(n_tasks=1801). To match the v1 workflow, the target (input) and predicted spectra are both expected to be sum-normalized. The SpectralFFN predictor handles the latter normalization. Normalizing the input spectra is not done by default at the moment so you'd have to perform this manually. We do plan to handle this for the user in future updates, but if you have the spectra absorbances in a dataframe, df_input[target_columns] = df_input[target_columns].div(df_input[target_columns].sum(axis=1), axis=0) should do the job.

I hope this helps, and let me know if you encounter any further issues.

@Rhys-McAlister
Copy link
Author

Hello, I've followed your steps but am still getting a nan training loss immediately, is there any information I can provide to help troubleshoot this?

@am2145
Copy link
Contributor

am2145 commented Apr 25, 2024

Hi Rhys,

When you create the train, validation, and test datasets, can you check if a further scaler is being applied? It would look like scaler = train_dset.normalize_targets() in the code. Since the workflow is to normalize the spectra for each species, I would disable further scaling on the dataset. This normalization is performed by default for tasks like regression across the dataset, but I could see it causing numerical issues here.

Additionally, we could double check the metric to see if it's a potential issue there. From your first post, I'm assuming it's MSE that you are using. Is this correct?

If you still are encountering NaN training losses after this, then it would be helpful to have a look at the input data if you are able to share a small example or some representative data that is similar to your actual training set.

@Rhys-McAlister
Copy link
Author

Hi, after removing the scalers I was still getting NaNs and so I just added a small constant (1e-3)ish to every row and that seems to fix the NaN issue for now

@am2145
Copy link
Contributor

am2145 commented Apr 29, 2024

Glad that it's working now. For our information going forward with the full spectral implementation in v2, were there any negative or zero values in the input data? v1 filtered these out similarly to what you did here, so that may be what was causing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants