Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC2024 Solution to DL Starter Problem #244

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

lucasmg18
Copy link

@lucasmg18 lucasmg18 commented Mar 13, 2024

GSoC2024 Solution to DL Starter Problem

Here I present the solution I implemented for this DL Starter Problem (GSoC2024).

Solution Description

First, I implemented an interpolation function that linearly interpolates the measurements. The results offered a good first approximation. Then I implemented a polynomial interpolation exploring the best degree of the polynomial. This method offered an output considering a wider range of information from the measurements and improving the solution.

Subsequently, I searched for the best ML models to predict these measurements. I started with Support Vector Regression model, which outperformed the previous methods after briefly tuning the parameters. This approach harnessed more detailed information from the data, achieving superior fits and capturing complex data relationships more accurately. Finally, despite experimenting with Deep Neural Networks, challenges arose due to overfitting, the high computational demands and the poor approximation to the measurements obtained, made it less feasible and appropriate to the project's scope.

Results

The following graphs, for each interpolation method, represent the plots for 3 sample galaxies, comparing the original measurements with the predicted ones showcasing also the interpolation function obtained to see how the method adjusted to the measurements.

Linear Interpolation

linear

We can clearly see how the interpolated points in the common wavelenghts are obtained directly form the linear interpolation. This is a very simple way to obtain the measurements although there are more advanced interpolation methods that can offer a more reliable solution based on a wider range of information from the measurements we have.

Polynomial Interpolation

polyn

The outcomes obtained with polynomial interpolation appear to be more grounded in the information and potential relationships between measurements than with linear interpolation. This method offers a well-adjusted interpolation that aligns closely with both the general trend and specific data points.

SVR Interpolation

SVR

Using interpolation through a Support Vector Regression (SVR) model demonstrates significantly improved outcomes, indicating that this approach effectively leverages a broader spectrum of information. Unlike simpler interpolation methods, SVR is adept at capturing complex relationships within the data by learning a detailed curve that represents a higher understanding of the actual wavelenght functions. The total computation time with this model is affordable being less than 5 minutes approximately.

DNN Interpolation

DNN

Using a deep neural network model, the results seem to be less accurately fitted. With extensive training, the model tends to overfit, resembling linear interpolation, while insufficient training leads to outcomes that do not align well with the expected values. Moreover, the training demands are significantly high for the scope of this project, presenting practical constraints in terms of time and computational resources. This suggests that although deep neural networks offer powerful modeling capabilities, their application may not be the most efficient or effective choice for projects with limited resources or those aimed at modeling data with these specific underlying patterns.

Final Conclusion

Upon reviewing the outcomes of various interpolation methods, the Support Vector Regression (SVR) model stands out as the most promising. This approach appears to encapsulate a broader array of information from the data measurements, demonstrating superior adaptability and precision in its fit compared to other techniques. Unlike the polynomial interpolation, which required careful balancing between degrees to avoid overfitting or underfitting, with results limited by the polynomial properites, and the deep neural network model, which faced challenges with overfitting and high computational demands, the SVR model effectively captures the complex relationships within the data while maintaining am affordable training.

Future Improvements

With more time, a fine-tuning of the parameters of the ML methods could be done to improve the results using cross-validation within the data we have. Also I could investigate more ML models that have proven good results in the past with similar problems. Finally, another way to approach this problem could be using pretrained models used in interpolation or even searching for similar data to train the ML models and try to improve their results.

Issue number: #243

@bsipocz bsipocz added the gsoc-2024-dl-toy-problem Starter problem for DL project, GSoC 2024 label Mar 13, 2024
Added More Interpolation Methods to the GSoC24 DL Starter Problem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc-2024-dl-toy-problem Starter problem for DL project, GSoC 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants