Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc strategy #29

Open
eacharles opened this issue Jun 14, 2023 · 5 comments
Open

pandoc strategy #29

eacharles opened this issue Jun 14, 2023 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@eacharles
Copy link
Collaborator

Looks good! The one catch is that we'll need to add something in the documentation about installing pandoc if users wish to build the notebooks locally with sphinx (as pandoc is a haskell library, so we can't bundle it into pip requirements)

Without pandoc in my rail conda environment, running make html from the docs/ directory of this PR results in an interrupted run and:

Notebook error:
PandocMissing in examples/core_examples/FileIO_DataStore.ipynb:
Pandoc wasn't found.
Please check that pandoc is installed:
https://pandoc.org/installing.html

Running conda install pandoc allows make html to run as expected.
Proposed solutions include:

  1. explaining this in the documentation, in a section relevant to devs trying to build notebooks (or documentation in general) locally
  2. we could also include a comment in our Makefile, in the hopes that anyone encountering trouble with make commands will read that
  3. we could include pandoc in our other conda packages in the standard rail install instructions, but I hesitate to add that kind of overhead for such a relatively niche use case

(This should not be an issue on ReadTheDocs, as pandoc should be automatically included. If I recall, there is a toggle in Advanced Settings where you can remove this, but unless we run into issues we can assume we don't have that set.)

@aimalz aimalz added the documentation Improvements or additions to documentation label Jul 14, 2023
@aimalz
Copy link
Collaborator

aimalz commented Sep 20, 2023

Re: pandoc, this is an example of what's normally included under [.dev] in the pyproject.toml. I think we're misusing that in rail and should move the rail* packages in the [.dev] list to the standard dependencies list so that new non-developer users don't need to counterintuitively invoke the [.dev] (or any other) option to install everything they need to run code found in the demos on ReadTheDocs. As it is now, pip install pz-rail is equivalent to pip install pz-rail-base so making these two changes would address the original problem defining this GH issue with no loss of functionality to developers while simultaneously improving the new user onboarding experience.

@eacharles
Copy link
Collaborator Author

The point of not having the rail packages in the [.dev] is to avoid forcing people to install everything in case there are issues with some of the packages, so I don't think we what to move them.

@sschmidt23
Copy link
Collaborator

Yeah, as I think we've discussed before, with several dependencies that can be a pain to install on some systems (e.g. M1 Macs and somoclu, jax, jaxlib, etc...), I think we decided that a minimal install as the base was best, particularly when paired with the command line utility that can attempt to install the sub-packages one-by-one that can successfully install subsets of RAIL in the presence of package failures (whereas trying to install everything in the base will fail to install RAIL if any one sub-package fails) was the way to go.

What needs to be present is instructions stating this, and with an example of the command-line syntax very prominently on the Docs page installation instructions. Also, in addition to [.dev], there is also [.algos] and [.nb] that install all algorithm sub-packages but do not include pytest, pylint, and other dev packages, that might be the recommended way to go in the "general install-all" case. Making sure that the instructions are clear in the Docs is, I think, the real issue here.

@aimalz
Copy link
Collaborator

aimalz commented Sep 21, 2023

To be clear, I'm not disputing that the potential for dependencies to cause installation failures is a real problem we have to worry about because it really has happened. The command-line tool is a comprehensive solution that effectively protects the developer team from anticipated disruptions like those that resulted from these dependency issues in the past, and makes it easier to manage the many repositories of the RAILiverse to boot.

Here, I'm only talking about how the current setup affects non-developer users, namely, it doesn't actually prevent the problem that originally motivated the current setup (which precludes the obvious solution to this GH issue), so that shouldn't be considered a barrier to making the suggested change. Let's go through what happens under the four combinations of when there is and isn't a problem with one of the dependencies and for naive users who run pip install pz-rail and careful users who check ReadTheDocs first and run pip install pz-rail [.dev]:

  1. If there isn't a problem with the dependencies, the careful user will encounter success when running code they copied out of the demos -- this is the outcome we'd like all users to experience!
  2. If there is a problem with the dependencies, the careful user will get error messages saying their installation failed on a specific pz-rail-* package. They won't be able to run code they copied from a demo and will subsequently be sad, but when they report the informative error to us, we are alerted to a real problem. If we can't fix it promptly, we remove the dependency from the pyproject.toml and make a quick release until we can resolve the underlying issue with that dependency.
  3. If there isn't a problem with the dependencies, the naive user gets no error message as an indication that the dependencies weren't installed, but code they copied out of a demo won't work because the module they're importing can't be found in their RAIL installation that seemed to succeed, and they'll be sad. They'll complain to us, and if we don't erroneously investigate other potential causes of the same error message, we'll tell them to follow the instructions they failed to look at the first time.
  4. If there is a problem with the dependencies, the naive user gets no indication that the dependencies weren't installed, but code they copied out of a demo won't work because the module they're importing can't be found in their RAIL installation that seemed to succeed, and they'll be sad -- this is the exact same outcome as in the above case. They'll complain to us, and if we don't erroneously investigate other potential causes of the same error message, we'll tell them to follow the instructions they failed to look at the first time, but of course it still won't work. They'll complain again with the error message indicating which pz-rail-* package is broken, and once we know about the real problem we either fix it quickly or remove the dependency from the pyproject.toml and make a prompt release until we can resolve the underlying issue.

Three of these four cases result in users not being able to successfully use rail, which should not be the default behavior even when there is no problem with the dependencies. We don't prevent the careful user who follows the instructions from being sad if a problem with the dependencies arises. Only one of the three failure cases results in us being alerted to a problem with an actionable solution for us to implement, while in the other two of the three failure cases, the user logically concludes that the problem wasn't in the installation, which is a bug, not a feature, and if there actually is a problem with a dependency, it takes longer for us to detect it. I'm not saying detailed installation instructions aren't useful and even essential, especially in light of the havoc wreaked by the widespread M1 Mac woes, and FWIW I think ours now meet a high standard, thanks to both of you. However, it's not unreasonable for exploratory users to expect pip install pz-rail to work because that's the way every Python package in astronomy (and data science) works, without any bracketed options, and there don't seem to be any advantages to creating a situation where there's a difference between our instructions to exploratory users and this standard model that's familiar and intuitive. So, my question is, in what way would it not be better, for the new users and for the developer team, to eliminate the distinction between "naive" and "careful" users by revising the setup to use the [.dev] option as intended rather than as the default install procedure for exploratory users?

@eacharles
Copy link
Collaborator Author

eacharles commented Sep 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants