-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp Markdown bindings for overhauled website #3699
Conversation
… and HTML sidebars.
…arkdown documentation.
@rcurtin It's really nice to see this come together. A few comments and suggestions based on the current state at https://www.ratml.org/misc/mlpack_new/ Main landing page:
Quickstart page:
Documentation page:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went a bit faster on the markdown part, but spent more time on the others
Technically, I agree with @conradsnicta on the following points, as I have felt the same thing:
C++ should not be the centre, as in some cases could be pushing people away, even though I agree that in my opinion, the ultimate objective of mlpack is to focus on places where other library fails to deliver, especially when it comes to embedded system C/C++
In the short description under the mlpack name in the top centre, suggest omitting "C++"; maybe change the description to "extensive library of flexible machine learning algorithms"; the idea is to avoid being pigeonholed into C++-only land right off the bat, as mlpack has bindings to other languages
I am not sure about this, yes it does make sense and we should add it, but I do not know where, and talking about Python Immediately without C++ can give the impression to people that we are a Python library, which should never be the case because we can not compete with other libraries when it comes to Python.
Suggest to add a third bullet point, explicitly stating that there are bindings to Python etc
Totally agree on this one, it does not make too much sense to cite the older one unless it was for a specific reason.
For the citation section, suggest using only one paper (the latest JOSS article from 2023), and remove the older ones; they're distracting and redundant
I think we should move this to a different place, and Yes I agree on removing Weka and Shogun, I never heard of mlpy or used it. We should replace these with TensorFlow or TensorFlow Lite, Pytorch, or any other modern maintained framework.
Also, we should mention some benchmarks regarding the binary footprint.
Suggest to remove or simplify the Benchmarks section; the table is too dense and has too many details; at the very least reduce the table down to 3 or 4 rows maximum; suggest to also remove the Shogun and WEKA columns -- these projects are pretty much dead and including them is just distracting
This is exactly what I thought about the quick start page as well, it should be named installation.
Suggest to change the "Download mlpack" button to link to the "Quickstart" page, instead of directly pointing to the .tar.gz archive; also remove mlpack version details, in order to reduce the maintenance burden; more on that below
Quickstart page:
Suggest to rename this section to "Installation" or "Download"
Suggest to have a separate section with "Latest Stable Release", linking to mlpack-4.3.0.tar.gz. This is instead of stating "mlpack is available in most package managers, or the source can be downloaded directly"
Suggest to add a note along these lines: "Pre-built mlpack packages are provided for various operating systems; these packages may not be the latest version; if you're encountering problems, use the official stable release provided here"
A couple of things that I had in mind:
-
I would remove the FAQ page because everything on that page has been already said before, and it does not add any value to the library, only maintenance burden.
-
If It is up to me I would remove the entire contributor section from the community, it is not well-updated, and there are so many names, that I doubt anyone would come to see what this is about.
-
I would simplify the page community, there is so much information, that it is hard to find the needed one
3.1) I would remove the word Developer from in front of the names, it feels too much academic, even all of us have left academia long time ago 😄
-
We should remove the tutorial button from the front page because it is leading to some very old thing. We can bring it back with modern tutorials.
-
We should add a page that talks about the Ecosystem of mlpack, (for example adding link to ensmallen, bandicoot, armadillo, and other libraries, that are using mlpack. Not only this, we can add old and current users, and what are they doing with the library, something like use cases, which is more relevant to people and attractive as well.
-
I really like Pytorch website, I think we should get some inspiration from there, regarding Edge dedicated page, and adding a Blog page, since we are going to add these in the future.
-
I really like the About page in Pytorch as well, including the installation
-
Finally, we do not need a link to the Home page with the home button, everyone is clicking on the logo, it is the easiest and the fastest way
@@ -92,7 +92,7 @@ BINDING_SEE_ALSO("@random_forest", "#random_forest"); | |||
BINDING_SEE_ALSO("Naive Bayes classifier on Wikipedia", | |||
"https://en.wikipedia.org/wiki/Naive_Bayes_classifier"); | |||
BINDING_SEE_ALSO("NaiveBayesClassifier C++ class documentation", | |||
"@src/mlpack/methods/naive_bayes/naive_bayes_classifier.cpp"); | |||
"@src/mlpack/methods/naive_bayes/naive_bayes_classifier.hpp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was really well hidden
// I wonder how long this full text PDF will remain available... | ||
"https://www.microsoft.com/en-us/research/uploads/prod/2006/01/" | ||
"Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it was up to me I would remove it from the binding for two reasons.
- This is an unnecessary piece of code that does not do anything and we have to maintain it in the longer term, and in some places host the paper.
- We have really good documentation right now, this is where users and developers are intended to land and not on the main.cpp of the binding. This means that this will never be read or used.
If it was up to me I would remove all the papers from here and mention them directly in the docs, the smartest way actually is to put a link in the bindings to our documentation instead of one to the paper. There, in the doc, we can take care of the paper and the details of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this does get read and used, these SEE_ALSO()
links get compiled directly into the "see also" section for the binding documentation. For instance, these links right here end up in the Bayesian linear regression documentation for each binding language: http://ratml.org/misc/mlpack_new/doc/user/bindings/python.html#see-also-1
As far as maintenance goes, the CI job that builds and tests documentation should tell us if the URLs ever go out of date.
Actually I will mark this as a draft for now so that I can handle all the changes, and then mark it as ready. |
I was traveling this past week, so it took me a while to get back to, but I much appreciate the comments @shrit and @conradsnicta. The example website has been updated with all the changes below. I also made some significant changes to the sidebar---instead of having different sidebars for binding documentation and for C++ documentation, now the same sidebar is used for both. I think that it's a lot more consistent and easy to navigate now. Some responses to each point and changes that I made. (Sorry that it got really long...)
Yes, I agree with this. I thought about it for a little while---I want to keep it clear that we are header-only, but if you remove the C++ then the term doesn't make sense... so I instead added a second line to the subtitle:
Yeah, maybe less visually interesting but certainly easier to read. Done.
Yeah, fair point, removed.
Well, I agree with both of those things, but I don't have any other benchmark numbers handy and want to avoid diving into getting benchmarking set up again... I also want to have some kind of proof point for "fast" on the homepage, even with the clear problems that have been pointed out and even though the numbers are almost old enough that they could get a driver's license in the US. Anyway, I want to update these for sure, but I'd prefer to leave the cleanup for a follow-up and come back to it.
Ah, thanks! I had meant to do that but apparently forgot to actually make the change. Done now.
Yes, good point, that will help prevent confusion between the "quickstarts" which are documentation and this page. I went with "Download".
Thanks, I did my best to refactor the download page to mention these.
All of the questions are obvious to us who know the project already, yeah, but a lot of people may be unfamiliar with these questions. I do get a decent number of emails from people who do not appear to have much expertise and would benefit from these pages. It doesn't hurt to keep the page there; I think none of the answers have been updated in a very long time anyway. ensmallen, Armadillo, and Bandicoot have similar pages.
The only thing that kept me from doing this is that some contributors from very long ago never had their contributions linked with a Github account. But I guess that maybe they are called out in COPYRIGHT.txt or something like this. Anyway, I removed it.
With the contributors list gone, it's a lot shorter now. Really the thing that most people will be interested in is the chat channels and maybe video call; the rest of the stuff is not particularly important but does need to go somewhere. That contribution guide is really helpful too---I point lots of people towards it and it saves me time writing the same thing over and over again. :)
Yeah, I think the NSF wanted it written like that many years ago. I dropped the "Developer" part, even though I do still dip my toes in academia from time to time. :)
Where do you see the link to the tutorials? Maybe I am looking at the wrong page? I don't see one on the new homepage. I thought I had removed all links to the old tutorials.
Not a bad idea, but I would prefer to save it for later, or if you want to do it, feel free.
Definitely we will need more direct links to the application notes and tutorials we will write, but I'm hesitant about adding a blog. Unless you have a team of people dedicated to writing blog content, it will always go out of date and then users who come look at the blog say "huh, it hasn't been updated in 3 years, I guess the project must have no momentum". I finally removed the old GSoC blog we used to use for this reason.
Actually the current version of the mlpack homepage has an installation selector kind of like PyTorch does, but I have found it hard to maintain and a little bit complicated. So I just went with something simple. If someone wanted to replace it with something better or more elaborate in the future, I'm not opposed, but I would prefer not to (I enjoy writing C++ not HTML/JS :)).
I see what you mean, but it doesn't hurt anything to leave it there. I personally prefer the explicit callout to the home page. |
further comments in mlpack/mlpack.org#61 (comment) |
@shrit - may need to approve this PR as well, since it's part of mlpack/mlpack.org#61 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
The documentation build fails, but only because the new website is not yet deployed. Otherwise all the links seem ok. |
This is a "paired PR" along with #61.
In recent months, I have been writing Markdown documentation (like this) for each individual C++ class that mlpack provides. This is easy to compile into HTML, and there is even a script to do it, in
scripts/build-docs.sh
. However, this handwritten Markdown documentation did not integrate with the automatically-built Markdown documentation for our bindings.I spent a few weeks trying to figure out how to put it all together, and I was able to do it, although it required a significant refactoring of how the Markdown bindings work. Here is a working version of the new mlpack website (also with changes from the PR referenced above):
https://www.ratml.org/misc/mlpack_new/
and, the documentation for Python is relatively unchanged, but now fits in with the themeing and formatting of the rest of the documentation:
https://www.ratml.org/misc/mlpack_new/doc/user/bindings/python.html
There are a number of important changes to the functionality here that I'll list below:
Instead of generating one gigantic
mlpack.md
file that contains documentation for all languages, thegenerate_markdown
program now takes two parameters: the first specifies the language to print documentation for, and the second specifies whether to generate the documentation as a Markdown file, or the sidebar for the documentation as HTML. This means that all the crazy sed/awk crap that the website build script had to do to splitmlpack.md
into one file for each language is no longer needed (phew!).The generated Markdown documentation is actually committed to the repository, in
doc/user/bindings/
. This allows someone to browse the documentation without needing to domake markdown
---in fact, they can browse it directly on Github (although Github's Markdown reader doesn't render everything perfectly). It also allows website building to be done without actually compiling mlpack, which is significant time savings: building the website from scratch before would take several hours, because we had to build each version of the library, and then build the Markdown bindings, etc., etc.To ensure that the Markdown bindings that are committed in the repository don't go out of date, I added
scripts/check-markdown-docs.sh
, and I'll set up a CI job on Jenkins that just runs it. The job will fail if the bindings have been updated in a way that the generated Markdown changes, but those changes were not committed todoc/user/bindings/
.build-docs.sh
now supports custom sidebars for documentation pages. I wanted the bindings to display a different sidebar---if the user is browsing Python documentation, the sidebar should not be full of links to C++-specific documentation. Example Python page, example regular documentation pageAll of the URLs in the
SEE_ALSO()
s for each binding are now checked by thebuild-docs.sh
script. In fact, all links in the generated Markdown bindings are now ensured to be valid.HISTORY.md
is now also compiled intoHISTORY.html
byscripts/build-docs.sh
, so that it can be referenced from the website. I had to make a handful of changes there so it would render reasonably.I know the diff is large... any feedback is much appreciated. 😄