Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 7 add sphinx doc #23

Merged
merged 5 commits into from Mar 11, 2024
Merged

Issue 7 add sphinx doc #23

merged 5 commits into from Mar 11, 2024

Conversation

kan-fu
Copy link
Collaborator

@kan-fu kan-fu commented Jan 26, 2024

Fix #7, Fix #8. I will leave #9 open because I haven't finished all the docstring.

Angela kindly agrees to review the second commit that contains a lot of documentation. I have some additional commit message in each commit to explain what I have done. https://oceannetworkscanada.github.io/api-python-client/

I added some line breaks for the markdown files to get rid of long sentences. The rule is not based on a fixed width, but more on a semantic meaning.

CI (doc.yml, doc build and deploy, there is no test step) is only triggered when pushing to main. I added a manual trigger in case it is desired before the merge. Currently the gh-page deployment workflow run has to be approved first by me as a protection rule.

For Eli: I have noticed some outdated info on the schema of response json (e.g, some values can be null, allowableRange and suboptions are missing in the discover data products, ...). I will compile a list after I finish all the docstring.

Copy link
Collaborator

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on migrating the documentation! I'm not entirely sure what's new vs what's just migrated, so let me keep my thoughts general to some of the technology in the PR, based on my experience with pysindy.

  • Have you tried adding the -W flag to the CI's sphinx invocation? sphinx is pretty good at building the docs regardless of RST mistakes, but it's useful to treat those mistakes as errors in CI, rather than just a warning in the GH action log.
  • The doc CI calls Makefile/Make.bat, instead of directly calling sphinx .... To wit, I'm not sure how to pass sphinx arguments to make html. RTD itself doesn't call make html, we've removed it, and looking at pandas, apparently they have too. Sphinx itself is pretty cross platform, so I'm not sure why sphinx quickstart autogenerates these files still.
  • RTD recommends pinning doc dependency versions. I'm a bit torn on the issue, myself, but their rationale is that without pinned versions, build errors can appear in CI are difficult to troubleshoot locally without scanning through the CI logs to find the exact versions of everything that might have caused the error.
  • I have struggled in the past to manage jupyter notebooks in a package repository. None of these seem problematically large or long-running, but for the sake of reading diffs it might be worth saving them as python files and only creating the notebooks using jupytext on doc builds.
  • Regardless, there's the question of how to test the jupyter notebooks. E.g. if you change the call signature of a function, how does CI catch the notebook that demonstrates that function? Here's our infrastructure for doing so. We actually test the .py files that shadow notebooks, because we have to substitute certain globals in order to run the notebooks extra fast, but you can do the same with actual ipynb files using nbconvert and nbformat if you're not concerned with test duration.

@kan-fu
Copy link
Collaborator Author

kan-fu commented Jan 27, 2024

Congrats on migrating the documentation! I'm not entirely sure what's new vs what's just migrated, so let me keep my thoughts general to some of the technology in the PR, based on my experience with pysindy.

  • Have you tried adding the -W flag to the CI's sphinx invocation? sphinx is pretty good at building the docs regardless of RST mistakes, but it's useful to treat those mistakes as errors in CI, rather than just a warning in the GH action log.
  • The doc CI calls Makefile/Make.bat, instead of directly calling sphinx .... To wit, I'm not sure how to pass sphinx arguments to make html. RTD itself doesn't call make html, we've removed it, and looking at pandas, apparently they have too. Sphinx itself is pretty cross platform, so I'm not sure why sphinx quickstart autogenerates these files still.
  • RTD recommends pinning doc dependency versions. I'm a bit torn on the issue, myself, but their rationale is that without pinned versions, build errors can appear in CI are difficult to troubleshoot locally without scanning through the CI logs to find the exact versions of everything that might have caused the error.
  • I have struggled in the past to manage jupyter notebooks in a package repository. None of these seem problematically large or long-running, but for the sake of reading diffs it might be worth saving them as python files and only creating the notebooks using jupytext on doc builds.
  • Regardless, there's the question of how to test the jupyter notebooks. E.g. if you change the call signature of a function, how does CI catch the notebook that demonstrates that function? Here's our infrastructure for doing so. We actually test the .py files that shadow notebooks, because we have to substitute certain globals in order to run the notebooks extra fast, but you can do the same with actual ipynb files using nbconvert and nbformat if you're not concerned with test duration.

Thanks for your suggestions!

  1. Getting rid of make file makes perfect sense to me. I actually saw a conversation at sphinx issues about moving away from makefile. It was in 2019, and I guess back then the makefile was included by default and now they provide a sphinx-quickstart for generating the makefile if anyone still likes the makefile idea. It is still in their official tutorial, so not sure how the sphinx develops think about it. -W is also a good idea. Other flags like 'T', 'E', and 'd' are optional, so I am not using them for simplicity.

  2. About pinning the version, this is the same question for the dependencies in pyproject.toml. I can understand their rationale, but I think the onc library is simple enough to leave all the versions blank. If CI errors happen in the future, we could pin the versions at that time.

  3. For the Jupyter notebooks, I admit the git diff is hard to read in GitHub, but I think there are two reasons that makes that not a big issue:

    • Most of our internal users are data scientists. Jupyter notebooks are a great tool for communication than python script with conversion back and forth. They use Jupyter notebooks for conference and workshop presentation, and it is very easy for developers to use them here without many modifications. Users can also download the Jupyter notebook to play with it locally (I just added the link in the new commit).
    • Once the Jupyter notebook is merged in main, I anticipate there won't be frequent changes on it. Even if a review is needed, VS Code has a better diff than GitHub, or the PR author can highlight the diff in the message, and reviewers just review the deployed version (I am thinking using my personal fork as a qa version of the GitHub Pages). I guess it is really a headache in the pysindy because there are too many notebooks from different levels of users. For now, I will be the only one to put Jupyter notebooks in this repo, so things are simpler here.
  4. For testing, myst_nb has the execution mode, so each build (push to main or manual trigger, only in CI) will run the notebook, and the doc will show the most updated output. I set nb_execution_raise_on_error to True in conf.py, so it will raise an exception for failed running error instead of warning (kind of unnecessary after adding the -W flag). I guess I shouldn't say "there is no test step" in CI. The running/testing is in the build process.

@Jacob-Stevens-Haas
Copy link
Collaborator

Jacob-Stevens-Haas commented Jan 27, 2024

Oh fully agree that notebooks are a great communication tool! I was suggesting having both the notebook and the script:

  • notebooks go into documentation build (and users who want them can download them directly)
  • changes in notebook script are easy to see in git diff, run in pytest locally.
  • jupytext in CI keeps the notebook and it's script in sync.

Not for now though! Just letting you know that if you run into the same issues we've run into at pysindy (notebooks that take too long to run even in CI, but which still need to keep up with breaking API changes, and whose edits to do so run large in git diff), there's a solution out there! And for a lighter version that doesn't use jupytext/scripts, but still has an option to do pytest notebooks so that it doesn't take a full doc build, that's also in the pysindy link. Just options for the future if you want!

@kan-fu
Copy link
Collaborator Author

kan-fu commented Jan 29, 2024

Oh fully agree that notebooks are a great communication tool! I was suggesting having both the notebook and the script:

  • notebooks go into documentation build (and users who want them can download them directly)
  • changes in notebook script are easy to see in git diff, run in pytest locally.
  • jupytext in CI keeps the notebook and it's script in sync.

Not for now though! Just letting you know that if you run into the same issues we've run into at pysindy (notebooks that take too long to run even in CI, but which still need to keep up with breaking API changes, and whose edits to do so run large in git diff), there's a solution out there! And for a lighter version that doesn't use jupytext/scripts, but still has an option to do pytest notebooks so that it doesn't take a full doc build, that's also in the pysindy link. Just options for the future if you want!

Sure, thanks for your suggestions! Definitely will keep that in mind in the future. We have planned a new internal scientific project that will have a lot of scripts and Jupyter notebooks, might be useful in the future when things become complicated.

@kan-fu
Copy link
Collaborator Author

kan-fu commented Feb 1, 2024

I will keep this PR open for a while because it will take some time for Angela to go over all the documentation.

sphinx-quickstart is used to generate the base config (conf.py and Makefile). After that, extension configs are added in the conf.py.
Here are the sources:
1. Tutorial: https://github.com/OceanNetworksCanada/api-tutorial/blob/master/Tutorial_Notebook.ipynb. For onc library tutorial, I added the requests (vanilla) example as a comparison.
2. Code Examples: https://wiki.oceannetworks.ca/display/O2A/Sample+Code
3. Contributing: None
4. API Guide: https://wiki.oceannetworks.ca/display/O2A/Discovery+methods
5. API Reference: auto generated from docstring
6. README: https://wiki.oceannetworks.ca/display/O2A/Getting+Started
7. Licence: I add more boilerplate declaration so that GitHub can detect it is Apache license
The information is mainly copied from OpenAPI page, with numpy docstring style applied.
All the doctests are disabled. I will add a separate file for them so they are not outdated.
@kan-fu kan-fu merged commit 469a5d7 into main Mar 11, 2024
12 checks passed
@kan-fu kan-fu deleted the issue-7-add-sphinx-doc branch March 11, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update README.md to be more user friendly Improve documentation
3 participants