Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making FabSim3 installable as a Python package using an existing build backend #250

Open
matt-graham opened this issue Nov 30, 2023 · 7 comments · May be fixed by #252
Open

Making FabSim3 installable as a Python package using an existing build backend #250

matt-graham opened this issue Nov 30, 2023 · 7 comments · May be fixed by #252
Milestone

Comments

@matt-graham
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Currently FabSim3 uses a bespoke approach for having users install the application via the configure_fabsim.py script, and the fabsim directory is not structured as a regular Python package due to the lack of __init__.py files in the top-level and sub directories.

This creates a few related issues:

  • FabSim3 cannot be installed with existing package managment tools like pip, conda, poetry or pdm. This means it cannot be specified as a dependency for other packages and libraries using standard approaches (for example within a pyproject.toml file, setup.py script, requirements.txt or other tool specific format such as conda environment YAML or lock files). This also makes it difficult to use tools like tox, which are designed to help automating running Python tests in isolated environments, with FabSim3 itself or packages / modules (such as FabSim3 plugins) which themselves depend on FabSim. For example we have hit against precisely this issue when trying to run tests for the FabNESO plugin using tox (First draft of tests on the ensemble_tools UCL/FabNESO#23).
  • If a user follows the recommendation in the installation script to add the root of the FabSim3 repository to PYTHONPATH in .bashrc (or equivalent for other shells), all of the directories within this root directory that are valid Python package / module names (that is docs, fabsim, lists, plugins, tests) will be importable as namespace packages in all Python environments that the user runs that do not otherwise overwrite PYTHONPATH. Polluting the Python import namespace globally like this is not ideal, and given the very generic names of some of these directories could lead to confusion for example if a user imports lists and finds that nothing is defined there.
  • The configure_fabsim.py script reimplements standard functionality available in build backends such as setuptools by manually installing dependencies in a way which makes it difficult to flexibly used FabSim with an environment management system of the users choice (the script forces use of pip to install dependencies and uses the --user argument unless the VIRTUALENV variable is set - this means if installing in a conda environment for example, the packages will still be installed in the user directory rather than isolated to the conda environment).
  • Presumably partly because of the above non-standard approach for making fabsim importable, plugins are recommended to use a verbose approach of importing names from fabsim.base.fab, which shouldn't be necessary if fabsim was installed into the site-packages directory as a regular package.
  • Because of the lack of __init__.py files in fabsim and its subdirectories, any imports from that namespace will be resolved as an (implicit) namespace package which carries a performance penalty compared to imporing a regular package.

Describe the solution you'd like

  • For the fabsim directory to be made in a regular Python package by including __init__.py files in the top-level and all subdirectories.
  • For the current configure_fabsim.py script approach used for installing FabSim3 to be changed to instead use a modern pyproject.toml based approach for declaring package metadata combined with a build backend such as setuptools or poetry.
  • For the fabsim package to register a console script entrypoint for its command line interface rather than having the user add the fabsim/bin directory manually to search path.
  • For FabSim3 plugins to be able to register a fabsim.plugins entrypoint as a (potentially alternative to while maintaining current) approach for allowing FabSim3 to discover available plugins.
  • For FabSim3 user specific configurations files such as each plugins machines_{plugin_name}_user.yml file and the top-level machines_user.yml file to be located in an appropriate platform specific application data directory using something like platformdirs.
  • Make the current automatic setup of SSH keys in

    FabSim3/configure_fabsim.py

    Lines 209 to 221 in 7d80396

    # setup ssh localhost
    if os.path.isfile("{}/.ssh/id_rsa.pub".format(os.path.expanduser("~"))):
    print("local id_rsa key already exists.")
    else:
    os.system('ssh-keygen -q -f ~/.ssh/id_rsa -t rsa -b 4096 -N ""')
    os.system("cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys")
    os.system("chmod og-wx ~/.ssh/authorized_keys")
    os.system("ssh-keyscan -H localhost >> ~/.ssh/known_hosts")
    # use ssh-add instead of ssh-copy-id for MacOSX
    if FS3_env.OS_system == "OSX":
    os.system("ssh-add /Users/{}/.ssh/id_rsa".format(FS3_env.user_name))

    something the user separately explicitly opts in to post install by running a fabsim command, as some users may not want FabSim3 to make changes to these files.

Describe alternatives you've considered

An alternative to making fabsim a regular package, would be to keep it as a namespace package as currently and either use a src layout plus auomatic package discovery or explicitly specify the package directory in a pyproject.toml / setup.cfg / setup.py file. This would then allow FabSim3 plugins to 'register' themselves by making themselves part of the same namespace package and the pkguti.iter_modules() function used to discover all registered modules in a particular namespace (for example fabsim.plugins). The disadvantage of this approach is the aforementioned slight performance disadvantage at import time of namespace versus regular packages, and that this approach will force all plugins which wish to use this approach of registering themselves to be namespace packages as a plugin created as regular package with an __init__.py would make other subpackages in namespace non-importable.

@djgroen
Copy link
Owner

djgroen commented Nov 30, 2023

@matt-graham Thank you very much for raising this issue!

We've looked into pip-ifying FabSim3 with @arabnejad about 3 years ago, and back then we concluded it wasn't a viable option due to a number of low-level configuration steps not being adequately supported in package form, the presence of in-package configuration files such as machines_user.yml, and the integration of code files with simulation configurations. However, from your analysis I tend to conclude that that situation has now changed. :)

Now I am in favour of making these changes, but unfortunately I currently do not possess the technical expertise to make these modifications (Hamid made the initial packaging structure at the time). If you are attending the Hackathon, then could we perhaps discuss how we could make such a revision happen? Most likely writing a support proposal will take me an order of magnitude less time as opposed to learning myself how restructure and package, and incorporating these modifications.

Lastly, automated testing with FabSim3 is certainly possible. Although I am not familiar with tox, we do perform automated tests with Flee and FabFlee (one of the plugins) using pytest. See https://www.github.com/djgroen/flee for details on that :).

@matt-graham
Copy link
Collaborator Author

matt-graham commented Nov 30, 2023

Hi @djgroen. Yep I will be remotely attending some of the hackathon, and it was partly because of this I thought it was worth raising this as an issue as I'd be interested in discussing and exploring how viable this would be to implement, and possibly trying to work on a PR to try to address these points at least partially 🙂. Appreciate this would be a fairly major change and it would need to be done in a way that is as backwards compatible as possible, but equally I think it would help with wider adoption if FabSim3 and plugins could be directly installed using pip (or another package manager).

With regards to automated testing - while I agree that it is possible, at the moment I believe it would require manually installing FabSim3 in the test environment. While certainly not impossible to do on a GitHub Actions test runner, I think it would be non-trivial and I think this probably deters plugin writers from having automated tests set up to run using GitHub Actions workflows (or an alternative CI/CD service). From a quick check through of the plugins currently listed in https://github.com/djgroen/FabSim3/blob/master/fabsim/deploy/plugins.yml, I could only spot a .github/workflows directory in FabFlee and FabNESO. For FabNESO at the moment we only have a workflow set up to run linting checks, though we are working on adding automated tests (but hit against the issues described above). For FabFlee it looks like there is a workflow for CodeQL set up but not one for running tests as far as I can see? I can see there are tests for FabFlee in https://github.com/djgroen/flee/blob/master/tests_mpi/test_fabflee.py which is run automatically by the workflow in https://github.com/djgroen/flee/blob/master/.github/workflows/Pytests.yml but it doesn't look like this tests running the plug-ins tasks directly as far as I can see (or at least I can't spot where FabSim3 is installed).

@djgroen djgroen added this to the FabSim3.5 milestone Dec 1, 2023
@matt-graham
Copy link
Collaborator Author

matt-graham commented Dec 4, 2023

Possible task breakdown

  • Make fabsim directory a regular Python package
  • Change to a src layout
    • Requires moving fabsim directory under a new top-level src directory.
    • Also would requiring updating all internal references to the fabsim path to respect the new structure.
    • If users then added just the new src directory to their PYTHONPATH this would ensure only fabsim will be made importable, rather than the current behaviour of making all subdirectories in root of repository that are valid Python module names importable.
  • Add a pyproject.toml file describing package metadata and allowing installation with pip
    • While there are various build backends that could be used with pyproject.toml, setuptools is probably a good bet in terms of maturity and flexibility.
    • Initially will be an alternative approach for installing fabsim package and its dependencies, with configure_fabsim.py still available for current install approach.
    • Assumption would be most users would install in editable mode .
    • The pyproject.toml would specify package dependencies (either directly in file or using dynamic metadata property referencing requirements.txt file) and potentially optional dependencies such as developer dependencies (Pytest, isort) or feature specific dependencies such as those in qcg_requirements.txt.
    • Could potentially make the dependency installation part of configure_fabsim.py (and recommendation to add repository to PYTHONPATH) switch-offable via a command line argument to script, to facilitate use case of pip installing package in editable mode then using script to do remaining set up.
  • Add console script entrypoint to pyproject.toml for fabsim command line interface
    • Will automatically add fabsim tool to search path when installed in a Python environment.
  • Add fabsim.plugins entrypoint to pyproject.toml FabSim plugins can register
    • Initially an alternative approach for plugin discovery that would allow plugins to be directly pip installed.
    • Plugin discovery would then look for either plugins in plugins directory or any which register the entrypoint.
  • Create fabsim task for post-installation set up
    • Create user specific copy of machines_user.yml.
    • Create localhost execution directory.
    • Set up SSH keys
    • Possibly export FABSIM3_HOME in shell configuration scripts and register autocomplete function?
  • Deprecate configure_fabsim.py script and switch to recommending pip based install.

@djgroen
Copy link
Owner

djgroen commented Dec 4, 2023

@matt-graham This looks really good to me overall. I can't tell whether there are essential bits missing, but for sure I feel able to tackle many of these tasks.

Today I am focused a bit more on EasyVVUQ and the Flee validation, but I will start to tackle this tomorrow. If you like you could already make a branch for this, so that I (we?) hit the ground running tomorrow?

@djgroen
Copy link
Owner

djgroen commented Dec 5, 2023

For "Change to a src layout" there is a slight conceptual issue in that plugins are not imported with this design.

To resolve this for now I will add the plugins folder to the Pythonpath as well for the time being.

....actually, this is resulting in a few other issues. May be worth having a quick chat about.

@djgroen
Copy link
Owner

djgroen commented Dec 5, 2023

Okay, after some deliberation we believe that it is better to first pip-ify the plugins, as moving FabSim3 to a src layout will change the way plugins are imported quite fundamentally.

The plan now is to start pip-ifying FabDummy.

@djgroen
Copy link
Owner

djgroen commented Dec 5, 2023

I've attempted to make a pipable version of FabDummy here: djgroen/FabDummy#3

And I've modified machines.py in FabSim3 so that it can support both old and new style plugin imports in the plugins/ folder. See branch here: #252

Some debugging may well still be required. In addition, I am wondering how to discover plugins that are not placed in the plugins/ folder...

@arindamsaha1507 arindamsaha1507 linked a pull request Dec 7, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants