Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for LogisticRegression and SoftmaxRegression #3564

Merged
merged 45 commits into from
Dec 21, 2023

Conversation

rcurtin
Copy link
Member

@rcurtin rcurtin commented Nov 24, 2023

Similar to #3560, I added documentation for the logistic regression and softmax regression classifiers. Compiled documentation as HTML can be seen here:

A number of changes were necessary to make all of these things work (as usual...):

  • Added MatType template parameter to SoftmaxRegression and used throughout.
  • Modified LogisticRegression and SoftmaxRegression to hold weights that have the same element type as MatType.
  • Added consistent overloads of constructors and Train(), which all can optionally take instantiated ensmallen optimizers and callbacks.
  • Added tests for all variants of constructors and Train().
  • Added Reset() functionality, which is needed because training is by default incremental.
  • Added reverse-compatible serialization implementations.

All examples were compiled and tested by hand... I would like to build an automated system to do this at some point, but I'll save that for another day.

@shrit
Copy link
Member

shrit commented Dec 4, 2023

any reason why the tests are failing ? the docs looks good, I need to go also through the code modification since it seems that there has been some modification and some functions have templates now. However, the failing tests seems to be unrelated

@shrit
Copy link
Member

shrit commented Dec 4, 2023

some of the links are not working especially related to the examples but I would say this should be resolved before merging right ?

@rcurtin
Copy link
Member Author

rcurtin commented Dec 8, 2023

any reason why the tests are failing ? the docs looks good, I need to go also through the code modification since it seems that there has been some modification and some functions have templates now. However, the failing tests seems to be unrelated

Yeah, it took me a little while to figure it out but I think the failing tests were Armadillo versioning issues. It should pass now... let's see...

some of the links are not working especially related to the examples but I would say this should be resolved before merging right ?

Yeah, links to other Markdown pages aren't working yet. I'll fix that once I put together some framework that puts all the documentation pages together; for now, it's ok that those links don't work. However if you find a link that is inside the page (like to a section or something) and that doesn't work, I should definitely fix that before merge.

@rcurtin
Copy link
Member Author

rcurtin commented Dec 14, 2023

This one is ready for review too---the builds are passing now (except for the random failure on the Windows build).

@shrit
Copy link
Member

shrit commented Dec 14, 2023

Okay will look into this one as well tomorrow

Copy link
Member

@shrit shrit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, it is amazing, I love these two pages, they are straightforward to the point, have no repetition and are easy to read. I discovered a lot of stuff I did not know at all.

It took me some time to go over it mainly because I wanted to click on each link to be sure that I did not miss anything.

Here are a couple of things that I have noticed:

  1. The following links are not working in the two pages (the link to one another), I think they should be working before merging as we discussed above unless if I am missing something
    Screenshot 2023-12-15 at 12-59-31 https __www ratml org
    Screenshot 2023-12-15 at 12-52-32 https __www ratml org
    Screenshot 2023-12-15 at 12-51-12 https __www ratml org
    Screenshot 2023-12-15 at 12-50-58 https __www ratml org
    Screenshot 2023-12-15 at 12-59-42 https __www ratml org

  2. I think the types here mean default type because we can use float and other types e,.g., arma::fmat. I was confused when I saw it because it seems as if it refers to the only type which is the case.
    Maybe change type to default type or something similar? I know this should be done on all the pages 😒 but it could be worth it.

Screenshot 2023-12-15 at 13-16-38 https __www ratml org

  1. I see we have added tests, and I am a bit surprised they are passing with the fact that we are declaring numbers without specifying the type in some cases. I know the compiler on my machine is translating this to double when I was testing dbscan and knn. I would say would be nice to cast it or define variables to be safe, but you know better than me.

This could be also the reason we are having a nan here in the failing tests, but it is hard to deduct,

D:\a\1\s\src\mlpack\tests\main_tests\radical_test.cpp(30): FAILED:
due to unexpected exception with message:
  sort(): detected NaN

I do not know if Radical test are using logistic regression somewhere

@rcurtin
Copy link
Member Author

rcurtin commented Dec 18, 2023

The following links are not working in the two pages (the link to one another), I think they should be working before merging as we discussed above unless if I am missing something

I'd actually rather wait to fix these links; I don't know quite how things should be organized overall quite yet, so I was planning to figure that out (hopefully this week), then fix all the links in the individual Markdown files all at once. Do you think that sounds reasonable?

I think the types here mean default type because we can use float and other types e,.g., arma::fmat. I was confused when I saw it because it seems as if it refers to the only type which is the case. Maybe change type to default type or something similar? I know this should be done on all the pages 😒 but it could be worth it.

I spent a long time thinking about it... I wanted to try and keep the documentation simple and use default types where possible. I could do default type, but I would have to denote in the template parameter section what all the types change to. Another idea would be to note something like this:

 * When using a custom `MatType`, the training parameters will change type:
 
 <table goes here>
 
 * The classification parameters will change type too:
  
  <table goes here>
  
  * `lr.Parameters()` will now return a `MatType&`...

That's just a basic idea, but it keeps the different type information constrained to the section where we talk about different MatTypes. What do you think?

I also thought about a drop-down selector box that could change types, but that gets really ugly and would mean that the Markdown wasn't easily readable anymore.

I see we have added tests, and I am a bit surprised they are passing with the fact that we are declaring numbers without specifying the type in some cases. I know the compiler on my machine is translating this to double when I was testing dbscan and knn. I would say would be nice to cast it or define variables to be safe, but you know better than me.

Thanks for pointing this out... I saw the comments you left and I'll address them. As far as I know, the implicit casting of a double literal to a float isn't an issue, like if we did:

float x = 3.0;

and I think that's done several times in the tests. I won't worry about those instances. I read this on that point.

This could be also the reason we are having a nan here in the failing tests, but it is hard to deduct,

D:\a\1\s\src\mlpack\tests\main_tests\radical_test.cpp(30): FAILED:
due to unexpected exception with message:
 sort(): detected NaN

I do not know if Radical test are using logistic regression somewhere

RADICAL doesn't use logistic regression, so I think the failures are unrelated. I have actually been chasing that failure for a very long time and I've never been able to reproduce it. I wonder if there is a subtle memory error caused by another test, that then happens to result in the RADICAL test failing... but I am not sure.

@rcurtin
Copy link
Member Author

rcurtin commented Dec 18, 2023

@shrit thank you so much for the comprehensive review! It is really helpful. I think I have responded to everything; let me know if not, and let me know what you think of the comments. I am pretty tired right now so I may have overlooked something (I probably should have just gone to bed, but oh well, too late now 😄)

Copy link
Member

@shrit shrit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@shrit
Copy link
Member

shrit commented Dec 18, 2023

Regarding the solution for default type, I do not have one, I want something really simple to understand without the need to add more tables or anything.
Maybe if we keep having an example for f32 in the end that would explain that these parameters are not hard coded and are templatized.
Let us keep them this way, maybe we will figure something out in the future for everything

Also I do not want us to mention MatType in the doc, it could scare people a bit, I would go for hiding it completely, if someone is looking for f32, they will find it in the doc, eventually they will understand it themselves at some point

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin
Copy link
Member Author

rcurtin commented Dec 19, 2023

Regarding the solution for default type, I do not have one, I want something really simple to understand without the need to add more tables or anything.

I opened #3584 so that the idea doesn't get lost. I have some ideas, but it may be a little while until I return to it.

@rcurtin
Copy link
Member Author

rcurtin commented Dec 21, 2023

The Python tests will be fixed in #3587, so I will go ahead and merge this despite the failing builds. (The Go failure appears to be random.)

@rcurtin rcurtin merged commit d678052 into mlpack:master Dec 21, 2023
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants