Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get the number of selected classifiers by a DES algorithm #130

Open
zahidcseku opened this issue Jan 12, 2019 · 8 comments
Open

Cannot get the number of selected classifiers by a DES algorithm #130

zahidcseku opened this issue Jan 12, 2019 · 8 comments
Assignees
Milestone

Comments

@zahidcseku
Copy link

Hello good people
I have been trying to get the number of classifiers selected by a DES algorithm. But I could not figure out how this can be achieved. I think it would a nice and useful feature to have.

Cheers
Zahid

@Menelau
Copy link
Collaborator

Menelau commented Jan 13, 2019

Hello,

I agree this would be a useful feature in the library. In the current version, the information is being computed in the select method (which called inside the classify_with_ds method). However, in the current version, this information is never accessible to the user.

We need to define a way to return the selected classifiers to the user, while still maintaining the library standards. I will think about how to make that easily accessible to the user. This should be a good feature to add for the v0.4 release.

@luizgh luizgh added this to the v0.4 milestone Feb 20, 2019
@luizgh
Copy link
Collaborator

luizgh commented Feb 20, 2019

@Menelau - here are some ideas for this issue

As you mentioned:

  • This information is available in the "classify_with_ds" and "predict_proba_with_ds" functions of both BaseDES and BaseDCS
  • These functions are called in the "predict" and "predict_proba" of BaseDS

Alternatives:

  1. Add an argument in "predict" and "predict_proba": "return_selected_classifiers", which would cause the methods to return a tuple: "predictions, selected_classifies", where "selected_classifiers" is a boolean mask (n_examples x n_classifiers) that contain, for each example, which classifiers were used for its prediction.

  2. Save the selected classifiers in an instance variable, that can later be accessed. Example:
    pred = knop.predict(x)
    selected_classifiers = knop.selected_classifiers

Note that both solutions have some problems: in (1), the functions "predict" and "predict_proba" would return either 1 value (the normal case), or 2 (when "return_selected_classifiers=True"). Option (2), on the other hand, may be misused: in general it is not good to store as an instance variable this type of "temporary" values. Look at this example:

pred = knop.predict(x)
some_other_func()
selected_classifiers = knop.selected_classifiers

If "some_other_func" uses knop.predict again (e.g. with different x), then "selected_classifiers" would have the incorrect value.

I think option (1) is still preferable: the only case that the function would return two values is when the user is actually asking for the value to be returned.

@Menelau
Copy link
Collaborator

Menelau commented Feb 20, 2019

Well I also think option 1 is better. Do you if there is any other estimator on scikit-learn that can return more than one value?

@luizgh
Copy link
Collaborator

luizgh commented Feb 20, 2019

I did a search for "return_" in the sklearn code base, and it seems that this strategy is used in a lot of cases. For instance, the KNN method "kneighbors" has a "return_distances" argument, that changes what is returned (just the indices, or also the distances). https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors.kneighbors

That being said, I think we should implement option 1. I will take care of this issue

@luizgh luizgh self-assigned this Feb 20, 2019
@Menelau
Copy link
Collaborator

Menelau commented Feb 20, 2019

Great! One think to think about is when the DS mechanism is not used to classify a certain example (either all classifiers agree or it is being classified by the KNN method).

So in these cases maybe we should have an special marker indicating that the DS mechanism was not used for this example.

@luizgh
Copy link
Collaborator

luizgh commented Feb 20, 2019

That complicates things a little bit. Some ideas:

  1. return "used_ds" and "selected_classifiers", where "used_ds" is a vector of [n_examples]
  2. return only one "selected_classifiers" variable, but use a list of lists. In this case, for the examples that did not use DS we can return an empty list.

Both options may be misused: For (1), if someone counts the average number of classifiers in selected_classifiers without taking "used_ds" into consideration, the value will be incorrect. Same thing for the second case, if the user does not properly disregard the examples with an empty list.

Another way is to return "used_ds" as a list of indexes, and the "selected_classifiers" be an array of [n_selected x n_classifiers].

@maffei2443
Copy link

May I suggest adding an "debug" or "experimental" which would allow for DESlib models to store data which is not critical for production but could greatly help researchers such the historical of selected_classfiers or other stuff?

I understand that this is not an standardized solution but could at least facilitate obtaining such kind of information.
Of course, this must not interfere (besides some performance penalty and spending more memory) with the functioning.

@Menelau
Copy link
Collaborator

Menelau commented Oct 30, 2021

@maffei2443 Hello,

I think having this functionality as an "debug" mode would be the best way of solving this issue for now, as we haven't figured out a way of adding this functionality while respecting other constraints/design patterns from scikit-learn.

Would you be interested in working on adding this functionality? Unfortunately I'm quite busy until the end of the year with little time to dedicate for coding. So I can't guarantee that I could add it myself in a short period of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants