GSoC2024 Solution to DL Starter Problem #244

lucasmg18 · 2024-03-13T05:10:17Z

GSoC2024 Solution to DL Starter Problem

Here I present the solution I implemented for this DL Starter Problem (GSoC2024).

Solution Description

First, I implemented an interpolation function that linearly interpolates the measurements. The results offered a good first approximation. Then I implemented a polynomial interpolation exploring the best degree of the polynomial. This method offered an output considering a wider range of information from the measurements and improving the solution.

Subsequently, I searched for the best ML models to predict these measurements. I started with Support Vector Regression model, which outperformed the previous methods after briefly tuning the parameters. This approach harnessed more detailed information from the data, achieving superior fits and capturing complex data relationships more accurately. Finally, despite experimenting with Deep Neural Networks, challenges arose due to overfitting, the high computational demands and the poor approximation to the measurements obtained, made it less feasible and appropriate to the project's scope.

Results

The following graphs, for each interpolation method, represent the plots for 3 sample galaxies, comparing the original measurements with the predicted ones showcasing also the interpolation function obtained to see how the method adjusted to the measurements.

Linear Interpolation

We can clearly see how the interpolated points in the common wavelenghts are obtained directly form the linear interpolation. This is a very simple way to obtain the measurements although there are more advanced interpolation methods that can offer a more reliable solution based on a wider range of information from the measurements we have.

Polynomial Interpolation

The outcomes obtained with polynomial interpolation appear to be more grounded in the information and potential relationships between measurements than with linear interpolation. This method offers a well-adjusted interpolation that aligns closely with both the general trend and specific data points.

SVR Interpolation

Using interpolation through a Support Vector Regression (SVR) model demonstrates significantly improved outcomes, indicating that this approach effectively leverages a broader spectrum of information. Unlike simpler interpolation methods, SVR is adept at capturing complex relationships within the data by learning a detailed curve that represents a higher understanding of the actual wavelenght functions. The total computation time with this model is affordable being less than 5 minutes approximately.

DNN Interpolation

Using a deep neural network model, the results seem to be less accurately fitted. With extensive training, the model tends to overfit, resembling linear interpolation, while insufficient training leads to outcomes that do not align well with the expected values. Moreover, the training demands are significantly high for the scope of this project, presenting practical constraints in terms of time and computational resources. This suggests that although deep neural networks offer powerful modeling capabilities, their application may not be the most efficient or effective choice for projects with limited resources or those aimed at modeling data with these specific underlying patterns.

Final Conclusion

Upon reviewing the outcomes of various interpolation methods, the Support Vector Regression (SVR) model stands out as the most promising. This approach appears to encapsulate a broader array of information from the data measurements, demonstrating superior adaptability and precision in its fit compared to other techniques. Unlike the polynomial interpolation, which required careful balancing between degrees to avoid overfitting or underfitting, with results limited by the polynomial properites, and the deep neural network model, which faced challenges with overfitting and high computational demands, the SVR model effectively captures the complex relationships within the data while maintaining am affordable training.

Future Improvements

With more time, a fine-tuning of the parameters of the ML methods could be done to improve the results using cross-validation within the data we have. Also I could investigate more ML models that have proven good results in the past with similar problems. Finally, another way to approach this problem could be using pretrained models used in interpolation or even searching for similar data to train the ML models and try to improve their results.

Issue number: #243

This is the solution I implemented for the DL Starter Problem (GSoC2024 )

Added More Interpolation Methods to the GSoC24 DL Starter Problem

xoubish and others added 5 commits March 12, 2024 11:14

GSOC exercise for ML/DL added

bdc3b39

Update README.md

bd9f030

changed .ipynb to .md

64ff4be

Update README.md

3fcad76

GSoC2024 Solution to DL Starter Problem

c21e572

This is the solution I implemented for the DL Starter Problem (GSoC2024 )

bsipocz added the gsoc-2024-dl-toy-problem Starter problem for DL project, GSoC 2024 label Mar 13, 2024

DL Starter Problem More Interpolation Methods

564d3f1

Added More Interpolation Methods to the GSoC24 DL Starter Problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC2024 Solution to DL Starter Problem #244

GSoC2024 Solution to DL Starter Problem #244

lucasmg18 commented Mar 13, 2024 •

edited

GSoC2024 Solution to DL Starter Problem #244

Are you sure you want to change the base?

GSoC2024 Solution to DL Starter Problem #244

Conversation

lucasmg18 commented Mar 13, 2024 • edited

GSoC2024 Solution to DL Starter Problem

Solution Description

Results

Linear Interpolation

Polynomial Interpolation

SVR Interpolation

DNN Interpolation

Final Conclusion

Future Improvements

lucasmg18 commented Mar 13, 2024 •

edited