New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for New Supervised Learning Data Simulation Classes in C++ for MLPACK Library #3559
Comments
Nice, I think this could make for more compelling examples than "generate random uniform data"! 👍 It's worth pointing out that mlpack already has a number of distribution-like classes: So, certainly some additional infrastructure is necessary to generate labeled synthetic datasets, but I do think that whatever we write should be "aware" of the distribution code and make use of it when possible in the implementation (and add new distributions as needed). A minor pedantic thought is that after #3269, pretty much everything in mlpack is directly in the At least personally I wouldn't worry about Open Question (2) too much; I think if we provide something relatively barebones at first, it will get immediately used in the documentation, and that's probably good enough for now. |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
Sounds like an expansion for distributions is in order to handle multi-point generation. With respect to On the note of namespaces, maybe this should go under |
Possibly, it would be great to keep things unified, but if it doesn't make sense (or if the amount of work for adapting older distributions is not feasible), in my view it's okay to keep them different.
It uses
I really think a flat namespace is fine, since there aren't really going to be any naming conflicts, but |
@rcurtin I am trying to find a good beginner's issue, do you think this feature request can be implemented by a beginner to learn about mlpack. |
Some of this can be a great way to jump into the codebase. |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
Active PR #3647 for reg case |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
What is the desired addition or change?
This RFC proposes the addition of new supervised learning data simulation classes to the MLPACK library in C++. The objective is to extend the library's capabilities by introducing classes specifically designed for generating synthetic datasets suitable for testing linear regression and logistic regression models. These simulation classes will offer flexibility in configuring key parameters, including:
What is the motivation for this feature?
The ability to generate synthetic datasets tailored for supervised learning scenarios is essential for robust model testing. Linear regression and logistic regression are fundamental techniques in this domain, and having dedicated simulation classes will enhance MLPACK's utility for researchers and practitioners.
If applicable, describe how this feature would be implemented.
Two distinct simulation classes, one for linear regression and one for logistic regression, will be implemented in C++ and integrated into the MLPACK library. Users will be able to instantiate these classes and set specific parameters to generate synthetic datasets for testing their models.
Example Usage
Linear Regression Simulation:
Logistic Regression Simulation:
Open Questions
Sample data generators
make_regression()
The text was updated successfully, but these errors were encountered: