Revamp Markdown bindings for overhauled website #3699

rcurtin · 2024-05-02T22:29:42Z

This is a "paired PR" along with #61.

In recent months, I have been writing Markdown documentation (like this) for each individual C++ class that mlpack provides. This is easy to compile into HTML, and there is even a script to do it, in scripts/build-docs.sh. However, this handwritten Markdown documentation did not integrate with the automatically-built Markdown documentation for our bindings.

I spent a few weeks trying to figure out how to put it all together, and I was able to do it, although it required a significant refactoring of how the Markdown bindings work. Here is a working version of the new mlpack website (also with changes from the PR referenced above):

https://www.ratml.org/misc/mlpack_new/

and, the documentation for Python is relatively unchanged, but now fits in with the themeing and formatting of the rest of the documentation:

https://www.ratml.org/misc/mlpack_new/doc/user/bindings/python.html

There are a number of important changes to the functionality here that I'll list below:

Instead of generating one gigantic mlpack.md file that contains documentation for all languages, the generate_markdown program now takes two parameters: the first specifies the language to print documentation for, and the second specifies whether to generate the documentation as a Markdown file, or the sidebar for the documentation as HTML. This means that all the crazy sed/awk crap that the website build script had to do to split mlpack.md into one file for each language is no longer needed (phew!).
The generated Markdown documentation is actually committed to the repository, in doc/user/bindings/. This allows someone to browse the documentation without needing to do make markdown---in fact, they can browse it directly on Github (although Github's Markdown reader doesn't render everything perfectly). It also allows website building to be done without actually compiling mlpack, which is significant time savings: building the website from scratch before would take several hours, because we had to build each version of the library, and then build the Markdown bindings, etc., etc.
To ensure that the Markdown bindings that are committed in the repository don't go out of date, I added scripts/check-markdown-docs.sh, and I'll set up a CI job on Jenkins that just runs it. The job will fail if the bindings have been updated in a way that the generated Markdown changes, but those changes were not committed to doc/user/bindings/.
build-docs.sh now supports custom sidebars for documentation pages. I wanted the bindings to display a different sidebar---if the user is browsing Python documentation, the sidebar should not be full of links to C++-specific documentation. Example Python page, example regular documentation page
All of the URLs in the SEE_ALSO()s for each binding are now checked by the build-docs.sh script. In fact, all links in the generated Markdown bindings are now ensured to be valid.
HISTORY.md is now also compiled into HISTORY.html by scripts/build-docs.sh, so that it can be referenced from the website. I had to make a handful of changes there so it would render reasonably.

I know the diff is large... any feedback is much appreciated. 😄

… and HTML sidebars.

…arkdown documentation.

conradsnicta · 2024-05-03T02:21:05Z

@rcurtin It's really nice to see this come together. A few comments and suggestions based on the current state at https://www.ratml.org/misc/mlpack_new/

Main landing page:

In the short description under the mlpack name in top center, suggest to omit "C++"; maybe change the description to "extensive library of flexible machine learning algorithms"; the idea is to avoid being pigeonholed into C++-only land right off the bat, as mlpack has bindings to other languages
Suggest to change the two blurbs under the logo (separated by a vertical line) into a simple bullet point list without the vertical divider
Suggest to add a third bullet point, explicitly stating that there are bindings to Python etc
For the citation section, suggest to use only one paper (the latest JOSS article from 2023), and remove the older ones; they're distracting and redundant
Suggest to remove or simplify the Benchmarks section; the table is too dense and has too many details; at the very least reduce the table down to 3 or 4 rows maximum; suggest to also remove the Shogun and WEKA columns -- these projects are pretty much dead and including them is just distracting
Suggest to change the "Download mlpack" button to link to the "Quickstart" page, instead of directly pointing to the .tar.gz archive; also remove mlpack version details, in order to reduce the maintenance burden; more on that below

Quickstart page:

Suggest to rename this section to "Installation" or "Download"
Suggest to have a separate section with "Latest Stable Release", linking to mlpack-4.3.0.tar.gz. This is instead of stating "mlpack is available in most package managers, or the source can be downloaded directly"
Suggest to add a note along these lines: "Pre-built mlpack packages are provided for various operating systems; these packages may not be the latest version; if you're encountering problems, use the official stable release provided here"

Documentation page:

Several sections appear empty, so assuming these are under construction: "Clustering algorithms", "Geometric algorithms", "Preprocessing utilities"

shrit

I went a bit faster on the markdown part, but spent more time on the others
Technically, I agree with @conradsnicta on the following points, as I have felt the same thing:

C++ should not be the centre, as in some cases could be pushing people away, even though I agree that in my opinion, the ultimate objective of mlpack is to focus on places where other library fails to deliver, especially when it comes to embedded system C/C++

    In the short description under the mlpack name in the top centre, suggest omitting "C++"; maybe change the description to "extensive library of flexible machine learning algorithms"; the idea is to avoid being pigeonholed into C++-only land right off the bat, as mlpack has bindings to other languages

I am not sure about this, yes it does make sense and we should add it, but I do not know where, and talking about Python Immediately without C++ can give the impression to people that we are a Python library, which should never be the case because we can not compete with other libraries when it comes to Python.

    Suggest to add a third bullet point, explicitly stating that there are bindings to Python etc

Totally agree on this one, it does not make too much sense to cite the older one unless it was for a specific reason.

    For the citation section, suggest using only one paper (the latest JOSS article from 2023), and remove the older ones; they're distracting and redundant

I think we should move this to a different place, and Yes I agree on removing Weka and Shogun, I never heard of mlpy or used it. We should replace these with TensorFlow or TensorFlow Lite, Pytorch, or any other modern maintained framework.

Also, we should mention some benchmarks regarding the binary footprint.

    Suggest to remove or simplify the Benchmarks section; the table is too dense and has too many details; at the very least reduce the table down to 3 or 4 rows maximum; suggest to also remove the Shogun and WEKA columns -- these projects are pretty much dead and including them is just distracting

This is exactly what I thought about the quick start page as well, it should be named installation.

    Suggest to change the "Download mlpack" button to link to the "Quickstart" page, instead of directly pointing to the .tar.gz archive; also remove mlpack version details, in order to reduce the maintenance burden; more on that below

Quickstart page:

    Suggest to rename this section to "Installation" or "Download"
    Suggest to have a separate section with "Latest Stable Release", linking to mlpack-4.3.0.tar.gz. This is instead of stating "mlpack is available in most package managers, or the source can be downloaded directly"
    Suggest to add a note along these lines: "Pre-built mlpack packages are provided for various operating systems; these packages may not be the latest version; if you're encountering problems, use the official stable release provided here"

A couple of things that I had in mind:

I would remove the FAQ page because everything on that page has been already said before, and it does not add any value to the library, only maintenance burden.
If It is up to me I would remove the entire contributor section from the community, it is not well-updated, and there are so many names, that I doubt anyone would come to see what this is about.
I would simplify the page community, there is so much information, that it is hard to find the needed one

3.1) I would remove the word Developer from in front of the names, it feels too much academic, even all of us have left academia long time ago 😄

We should remove the tutorial button from the front page because it is leading to some very old thing. We can bring it back with modern tutorials.
We should add a page that talks about the Ecosystem of mlpack, (for example adding link to ensmallen, bandicoot, armadillo, and other libraries, that are using mlpack. Not only this, we can add old and current users, and what are they doing with the library, something like use cases, which is more relevant to people and attractive as well.
I really like Pytorch website, I think we should get some inspiration from there, regarding Edge dedicated page, and adding a Blog page, since we are going to add these in the future.
I really like the About page in Pytorch as well, including the installation
Finally, we do not need a link to the Home page with the home button, everyone is clicking on the logo, it is the easiest and the fastest way

shrit · 2024-05-03T10:47:56Z

src/mlpack/methods/naive_bayes/nbc_main.cpp

@@ -92,7 +92,7 @@ BINDING_SEE_ALSO("@random_forest", "#random_forest");
 BINDING_SEE_ALSO("Naive Bayes classifier on Wikipedia",
    "https://en.wikipedia.org/wiki/Naive_Bayes_classifier");
 BINDING_SEE_ALSO("NaiveBayesClassifier C++ class documentation",
-    "@src/mlpack/methods/naive_bayes/naive_bayes_classifier.cpp");
+    "@src/mlpack/methods/naive_bayes/naive_bayes_classifier.hpp");


this was really well hidden

shrit · 2024-05-03T10:53:33Z

src/mlpack/methods/bayesian_linear_regression/bayesian_linear_regression_main.cpp

+    // I wonder how long this full text PDF will remain available...
+    "https://www.microsoft.com/en-us/research/uploads/prod/2006/01/"
+    "Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf");


if it was up to me I would remove it from the binding for two reasons.

This is an unnecessary piece of code that does not do anything and we have to maintain it in the longer term, and in some places host the paper.

We have really good documentation right now, this is where users and developers are intended to land and not on the main.cpp of the binding. This means that this will never be read or used.

If it was up to me I would remove all the papers from here and mention them directly in the docs, the smartest way actually is to put a link in the bindings to our documentation instead of one to the paper. There, in the doc, we can take care of the paper and the details of the method.

Actually this does get read and used, these SEE_ALSO() links get compiled directly into the "see also" section for the binding documentation. For instance, these links right here end up in the Bayesian linear regression documentation for each binding language: http://ratml.org/misc/mlpack_new/doc/user/bindings/python.html#see-also-1

As far as maintenance goes, the CI job that builds and tests documentation should tell us if the URLs ever go out of date.

rcurtin · 2024-05-03T12:41:25Z

Actually I will mark this as a draft for now so that I can handle all the changes, and then mark it as ready.

…ocumentation pages.

rcurtin · 2024-05-13T14:19:15Z

I was traveling this past week, so it took me a while to get back to, but I much appreciate the comments @shrit and @conradsnicta. The example website has been updated with all the changes below.

I also made some significant changes to the sidebar---instead of having different sidebars for binding documentation and for C++ documentation, now the same sidebar is used for both. I think that it's a lot more consistent and easy to navigate now.

Some responses to each point and changes that I made. (Sorry that it got really long...)

In the short description under the mlpack name in top center, suggest to omit "C++"; maybe change the description to "extensive library of flexible machine learning algorithms"; the idea is to avoid being pigeonholed into C++-only land right off the bat, as mlpack has bindings to other languages

C++ should not be the centre, as in some cases could be pushing people away, even though I agree that in my opinion, the ultimate objective of mlpack is to focus on places where other library fails to deliver, especially when it comes to embedded system C/C++

Yes, I agree with this. I thought about it for a little while---I want to keep it clear that we are header-only, but if you remove the C++ then the term doesn't make sense... so I instead added a second line to the subtitle: with bindings to Python, Julia, R, Go, and the command-line. Hopefully that should cover the bases here.

Suggest to change the two blurbs under the logo (separated by a vertical line) into a simple bullet point list without the vertical divider

Yeah, maybe less visually interesting but certainly easier to read. Done.

For the citation section, suggest to use only one paper (the latest JOSS article from 2023), and remove the older ones; they're distracting and redundant

Yeah, fair point, removed.

Suggest to remove or simplify the Benchmarks section; the table is too dense and has too many details; at the very least reduce the table down to 3 or 4 rows maximum; suggest to also remove the Shogun and WEKA columns -- these projects are pretty much dead and including them is just distracting

I think we should move this to a different place, and Yes I agree on removing Weka and Shogun, I never heard of mlpy or used it. We should replace these with TensorFlow or TensorFlow Lite, Pytorch, or any other modern maintained framework.
Also, we should mention some benchmarks regarding the binary footprint.

Well, I agree with both of those things, but I don't have any other benchmark numbers handy and want to avoid diving into getting benchmarking set up again... I also want to have some kind of proof point for "fast" on the homepage, even with the clear problems that have been pointed out and even though the numbers are almost old enough that they could get a driver's license in the US. Anyway, I want to update these for sure, but I'd prefer to leave the cleanup for a follow-up and come back to it.

Suggest to change the "Download mlpack" button to link to the "Quickstart" page, instead of directly pointing to the .tar.gz archive; also remove mlpack version details, in order to reduce the maintenance burden; more on that below

Ah, thanks! I had meant to do that but apparently forgot to actually make the change. Done now.

Quickstart page:
Suggest to rename this section to "Installation" or "Download"

Yes, good point, that will help prevent confusion between the "quickstarts" which are documentation and this page. I went with "Download".

Suggest to have a separate section with "Latest Stable Release", linking to mlpack-4.3.0.tar.gz. This is instead of stating "mlpack is available in most package managers, or the source can be downloaded directly"
Suggest to add a note along these lines: "Pre-built mlpack packages are provided for various operating systems; these packages may not be the latest version; if you're encountering problems, use the official stable release provided here"

Thanks, I did my best to refactor the download page to mention these.

I would remove the FAQ page because everything on that page has been already said before, and it does not add any value to the library, only maintenance burden.

All of the questions are obvious to us who know the project already, yeah, but a lot of people may be unfamiliar with these questions. I do get a decent number of emails from people who do not appear to have much expertise and would benefit from these pages. It doesn't hurt to keep the page there; I think none of the answers have been updated in a very long time anyway. ensmallen, Armadillo, and Bandicoot have similar pages.

If It is up to me I would remove the entire contributor section from the community, it is not well-updated, and there are so many names, that I doubt anyone would come to see what this is about.

The only thing that kept me from doing this is that some contributors from very long ago never had their contributions linked with a Github account. But I guess that maybe they are called out in COPYRIGHT.txt or something like this. Anyway, I removed it.

I would simplify the page community, there is so much information, that it is hard to find the needed one

With the contributors list gone, it's a lot shorter now. Really the thing that most people will be interested in is the chat channels and maybe video call; the rest of the stuff is not particularly important but does need to go somewhere. That contribution guide is really helpful too---I point lots of people towards it and it saves me time writing the same thing over and over again. :)

I would remove the word Developer from in front of the names, it feels too much academic, even all of us have left academia long time ago 😄

Yeah, I think the NSF wanted it written like that many years ago. I dropped the "Developer" part, even though I do still dip my toes in academia from time to time. :)

We should remove the tutorial button from the front page because it is leading to some very old thing. We can bring it back with modern tutorials.

Where do you see the link to the tutorials? Maybe I am looking at the wrong page? I don't see one on the new homepage. I thought I had removed all links to the old tutorials.

We should add a page that talks about the Ecosystem of mlpack, (for example adding link to ensmallen, bandicoot, armadillo, and other libraries, that are using mlpack. Not only this, we can add old and current users, and what are they doing with the library, something like use cases, which is more relevant to people and attractive as well.

Not a bad idea, but I would prefer to save it for later, or if you want to do it, feel free.

I really like Pytorch website, I think we should get some inspiration from there, regarding Edge dedicated page, and adding a Blog page, since we are going to add these in the future.

Definitely we will need more direct links to the application notes and tutorials we will write, but I'm hesitant about adding a blog. Unless you have a team of people dedicated to writing blog content, it will always go out of date and then users who come look at the blog say "huh, it hasn't been updated in 3 years, I guess the project must have no momentum". I finally removed the old GSoC blog we used to use for this reason.

I really like the About page in Pytorch as well, including the installation

Actually the current version of the mlpack homepage has an installation selector kind of like PyTorch does, but I have found it hard to maintain and a little bit complicated. So I just went with something simple. If someone wanted to replace it with something better or more elaborate in the future, I'm not opposed, but I would prefer not to (I enjoy writing C++ not HTML/JS :)).

Finally, we do not need a link to the Home page with the home button, everyone is clicking on the logo, it is the easiest and the fastest way

I see what you mean, but it doesn't hurt anything to leave it there. I personally prefer the explicit callout to the home page.

conradsnicta · 2024-05-14T01:29:40Z

further comments in mlpack/mlpack.org#61 (comment)

conradsnicta · 2024-05-20T07:05:10Z

@shrit - may need to approve this PR as well, since it's part of mlpack/mlpack.org#61

shrit

Looks good to me

rcurtin · 2024-05-26T11:22:45Z

The documentation build fails, but only because the new website is not yet deployed. Otherwise all the links seem ok.

rcurtin added 18 commits April 22, 2024 12:54

Minor changes to build script to allow custom headers and footers.

7d5dfc4

Update quickstarts to use relative links.

445d3a1

Allow custom sidebars for some files.

9c2f591

Revamp Markdown bindings to output individual language-specific files…

84b0fd9

… and HTML sidebars.

Fix links in documentation, and other cleanups.

ada80dd

Some minor additional cleanups.

f0f24cf

Add output of Markdown target directly to the repository.

e99ae26

Remove version number from generated output.

36cd02c

Minor cleanups, and remove version number.

1a71fb1

Fix links to use LINKROOT.

77d3680

Slightly more descriptive output.

07dba17

Add custom sidebars for quickstarts and print page title.

edd924f

Fix sidebar scrolling, and split apart list for data types in bindings.

b27082a

Add a script to check whether anything has changed in the generated M…

d8a1529

…arkdown documentation.

Why can't I seem to get a particular link for this paper that works?

c38561a

Update HISTORY.md so that it is also built into a webpage.

0e94913

Minor fixes for anchors and other issues.

a824295

Merge remote-tracking branch 'origin/master' into website-revamp

2a217de

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels May 2, 2024

rcurtin mentioned this pull request May 2, 2024

Overhaul website to integrate with Markdown bindings mlpack/mlpack.org#61

Merged

conradsnicta approved these changes May 3, 2024

View reviewed changes

conradsnicta added c: documentation and removed s: unanswered s: unlabeled labels May 3, 2024

conradsnicta requested a review from shrit May 3, 2024 02:23

shrit requested changes May 3, 2024

View reviewed changes

rcurtin marked this pull request as draft May 3, 2024 12:41

rcurtin added 2 commits May 7, 2024 10:27

Revamp sidebar: make sure it is the same across regular and binding d…

7f4b7e4

…ocumentation pages.

Tag binding sidebar and colorize it.

87be400

How did this sneak in here?

879c5f1

rcurtin added 2 commits May 14, 2024 22:59

Move community and GSoC pages away from website and into documentation.

b7cb52e

Merge remote-tracking branch 'origin/master' into website-revamp

1b63fa0

rcurtin marked this pull request as ready for review May 17, 2024 02:14

shrit approved these changes May 20, 2024

View reviewed changes

rcurtin added 4 commits May 24, 2024 07:22

Merge branch 'master' into website-revamp

22ff9cb

Merge remote-tracking branch 'origin/master' into website-revamp

bdebab7

Fix incorrect link.

b4348a0

Fix bad links in new developer documentation.

401ede1

rcurtin merged commit 9107c55 into mlpack:master May 26, 2024
19 of 20 checks passed

rcurtin deleted the website-revamp branch May 26, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp Markdown bindings for overhauled website #3699

Revamp Markdown bindings for overhauled website #3699

rcurtin commented May 2, 2024 •

edited

conradsnicta commented May 3, 2024 •

edited

shrit left a comment •

edited

shrit May 3, 2024

shrit May 3, 2024

rcurtin May 3, 2024 •

edited

rcurtin commented May 3, 2024

rcurtin commented May 13, 2024

conradsnicta commented May 14, 2024

conradsnicta commented May 20, 2024

shrit left a comment

rcurtin commented May 26, 2024

Revamp Markdown bindings for overhauled website #3699

Revamp Markdown bindings for overhauled website #3699

Conversation

rcurtin commented May 2, 2024 • edited

conradsnicta commented May 3, 2024 • edited

shrit left a comment • edited

Choose a reason for hiding this comment

shrit May 3, 2024

Choose a reason for hiding this comment

shrit May 3, 2024

Choose a reason for hiding this comment

rcurtin May 3, 2024 • edited

Choose a reason for hiding this comment

rcurtin commented May 3, 2024

rcurtin commented May 13, 2024

conradsnicta commented May 14, 2024

conradsnicta commented May 20, 2024

shrit left a comment

Choose a reason for hiding this comment

rcurtin commented May 26, 2024

rcurtin commented May 2, 2024 •

edited

conradsnicta commented May 3, 2024 •

edited

shrit left a comment •

edited

rcurtin May 3, 2024 •

edited