Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for relative input shapes inside bindings. #2820

Closed
47 tasks
NippunSharma opened this issue Jan 29, 2021 · 27 comments
Closed
47 tasks

Add checks for relative input shapes inside bindings. #2820

NippunSharma opened this issue Jan 29, 2021 · 27 comments

Comments

@NippunSharma
Copy link
Contributor

NippunSharma commented Jan 29, 2021

What is the desired addition or change?

A parameter check should be added to all bindings that checks for the correct relative input shapes. For example the training data and training labels must have the same number of samples, testing data and training data must have the same number of features etc.

What is the motivation for this feature?

Refer to #2818 for more information and how to reproduce the bug.

If applicable, describe how this feature would be implemented.

  • As discussed with @rcurtin the checks should be implemented inside src/mlpack/core/util/param_check.hpp and then we should call these checks from the bindings.

  • You can take a look at how RequireOnlyOnePassed is used the _main.cpp files for each method in the mlpackMain() function and is implemented in src/mlpack/core/util/param_check_impl.hpp. These checks would also be implemented and used similarly.

  • We need to check each binding (listed below) separately and add these checks if they are not already there. For example in Linear SVM same number of columns between training data and labels are already checked here.

  • adaboost

  • decision_stump

  • decision_tree

  • hoeffding_tree

  • linear_svm

  • logistic_regression

  • nbc

  • perceptron

  • random_forest

  • softmax_regression

  • bayesian_linear_regression

  • lars

  • linear_regression

  • dbscan

  • gmm_train

  • gmm_generate

  • gmm_probability

  • kmeans

  • mean_shift

  • approx_kfn

  • emst

  • fastmks

  • lsh

  • knn

  • kfn

  • krann

  • preprocess_split

  • preprocess_binarize

  • preprocess_describe

  • preprocess_scale

  • preprocess_one_hot_encoding

  • image_converter

  • cf

  • det

  • hmm_train

  • hmm_loglik

  • hmm_viterbi

  • hmm_generate

  • kde

  • nmf

  • kernel_pca

  • lmnn

  • local_coordinate_coding

  • nca

  • pca

  • radical

  • sparse_coding

Note that not all of the bindings written above will require these checks because for some they would be there already. So it would be better to check this before changing any code (please comment below after checking so that I can update the description to indicate where the checks are not required).

Additional information?

If you can think of a better way to implement it then you can comment below and we can discuss it.

Update: Also replace the occurences where we are comparing sizes of matrices with utilities created in #2370.

@NippunSharma
Copy link
Contributor Author

I have changed the description above. Please check it again if you have already started working on this. Sorry for any inconvenience caused.

@nklskyoy
Copy link
Contributor

Hey, I would check

  • adaboost

  • decision_stump

  • decision_tree

  • hoeffding_tree

if these points are still free. I consider it a good starting point to dive into the project. Would take me about a week.

@NippunSharma
Copy link
Contributor Author

Hi @onikolskyy , sure feel free. Remember that you will also have to implement the checking functions inside param_check.hpp and then add to the bindings (if we are not already checking for the same conditions in some way or another). You can post any questions you have over here.

@shubhamabhang77
Copy link

Is there anyone working on this issue??
I would like to work this to get started contributing.
We have add write declaration in param_check.hpp and add the definition in param_check_impl.hpp, right?

@nklskyoy
Copy link
Contributor

@shubhamabhang77 mate, I am working on the adaboost bug #2818 right now , would you mind taking one of the other points?
I will report my status later today/tomorrow

@nklskyoy
Copy link
Contributor

nklskyoy commented Feb 21, 2021

@NippunSharma I have an issue with setting up a dev environment for the bindings.
I have uploaded a question, perhaps there is a quick answer? #2843

@shubhamabhang77
Copy link

shubhamabhang77 commented Feb 22, 2021

yeah yeah sure. I will work on other points. You work on your points.

@nklskyoy
Copy link
Contributor

Hey guys,
I would be happy to make my first commit with a fix of #2818.
However, I get a permission denied for my github acc when I try to push a new branch with the commit.
Do I need some extra rights or does the contribution pipeline work differently?

@NippunSharma
Copy link
Contributor Author

You cannot push directly to the mlpack master, you will not have the access. You will have to create a fork of this repo and then push to your fork and then create a pull request. You can look up on the net, the process is very easy and will not take much time.

@rcurtin
Copy link
Member

rcurtin commented Mar 7, 2021

Two updates relevant to this issue:

@mlpack-bot
Copy link

mlpack-bot bot commented Apr 21, 2021

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Apr 21, 2021
@NippunSharma
Copy link
Contributor Author

Keep open.

@PranavReddyP16
Copy link

Hey, @rcurtin @NippunSharma I assumed that this hasn't been fixed since the last comment is from a month ago so I thought I'd take a look at it. I was thinking of replacing all the occurrences of size and dimensionality checks using the functions added in #2370. I was taking a look at logistic regression and there was an instance of code that seemed similar to the function but can't be replaced directly by it - Logistic Regression Code instance. Would it be better to rewrite it slightly or keep it as it is?
Thanks!

@NippunSharma
Copy link
Contributor Author

Hey @PranavReddyP16, thanks for showing interest in this issue. As you can read from the description we only need to include checks where they are not present. There is no need to replace the "if" conditions and other occurrences, at least for this issue. For better understanding, you can take a look at #2818, which is the actual issue we are facing. Let me know if you have any other doubts 👍

@PranavReddyP16
Copy link

Two updates relevant to this issue:

Hey, @NippunSharma thanks for the clarification. I was a little confused because of the second point of this comment from Ryan and I thought we had to replace if statements and add checks where needed. I'll get to work on adding check functions if they're not present. Just to confirm, I can use the functions implemented in #2370 in the (method)_main.cpp files in the mlpackMain() function am I right? Thanks again!

@NippunSharma
Copy link
Contributor Author

Ya, you are right. I am sorry for the confusion here. I forgot to update the issue description, I guess that is why it slipped my mind. I have added that now, thanks for reminding 👍. Just to clarify again, you will have to replace all other occurrences along with the checks. Yes you have to use functions implemented in #2370.You can create a PR and we can discuss any other doubts you have over there 👍

@PranavReddyP16
Copy link

Cool, thanks a lot for clearing that up. I'll get started on it.

@NippunSharma
Copy link
Contributor Author

NippunSharma commented Jun 20, 2021

@ayushi19031, go ahead. I think @onikolskyy @PranavReddyP16 were working on this but it has been quite some time, best to ask them first if they are fine with it👍

@nklskyoy
Copy link
Contributor

HI @ayushi19031, I haven't worked on this since my last contribution. Please feel free to continue

@VedangAsgaonkar
Copy link

Is this issue still open?

@rcurtin
Copy link
Member

rcurtin commented Jan 13, 2022

@VedangAsgaonkar yes, as you can see, the issue is not closed, so it is still open. 😄

Please feel free to take a look at some of the PRs that have already been done for other bindings (you can look through the list of linked PRs), then pick a binding and open a PR for it. I'd recommend smaller PRs, one per binding, which will make it easier to get them merged. I think the description that @NippunSharma left above is sufficient to get started. 👍

@eshaanagarwal
Copy link
Contributor

Hi ! I would like to take up PCA first and then move on to preprocess_split and other bindings.

@AdarshSantoria
Copy link
Contributor

I would like to work on this issue.

@rcurtin
Copy link
Member

rcurtin commented Dec 23, 2022

@AdarshSantoria feel free, if you open a PR it will be reviewed when possible. 👍

@lumi232
Copy link

lumi232 commented Feb 27, 2024

Hi, I have replaced all the if checks in the Linear Regression method with the utility introduced in #2370. I have made a PR (#3640) looking forward to it being merged. I also plan to do this with other methods after this one is merged successfully.

@sukjingitsit
Copy link

sukjingitsit commented Feb 27, 2024

Hello, I have only recently started with open source contributions, and I identified this as a good first issue for me to tackle. I would like to address this issue as specifically for PCA, NCA, Kernel-PCA and Perceptrons as a starting point. I am currently going through the codebase of mlpack to get an understanding of the code and dealing with the issue and will update once I start working on the coding part (expected in 1-2 days). So far, as mentioned in the above discussion, I believe using methods from size_checks.hpp will be the way forward.

@mlpack-bot mlpack-bot bot closed this as completed Mar 18, 2024
ViswanathBalla22 added a commit to ViswanathBalla22/mlpack that referenced this issue Mar 18, 2024
This pull request aims to solve some tasks in issue mlpack#2820 with help of mlpack#2370 as discussed in the comments.

Tasks done :
Added size checks using CheckSameSizes and CheckSameDimensionality utilities in logistic regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

14 participants