Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mintpy adjustments to simplify conda-forge recipe #652

Open
4 of 5 tasks
yunjunz opened this issue Aug 30, 2021 · 7 comments
Open
4 of 5 tasks

mintpy adjustments to simplify conda-forge recipe #652

yunjunz opened this issue Aug 30, 2021 · 7 comments

Comments

@yunjunz
Copy link
Member

yunjunz commented Aug 30, 2021

Copy over the to-do list in mintpy to simplify the conda-forge recipe by @jhkennedy from #648:

  • 1. test_smallbaselineApp.py and similar should look for configs relative to itself, not $MINTPY_HOME or mintpy.__file__

  • 2. not printing on import:

    try:
    os.environ['MINTPY_HOME']
    except KeyError:
    print('Using default MintPy Path: %s' % (mintpy_path))
    os.environ['MINTPY_HOME'] = mintpy_path

  • 3. Drop any reliance on manipulating environment variables for installation (PATH, PYTHONPATH especially; MINTPY_HOME as well ideally) by leveraging console_scripts entry_points in setup.py

  • 4. For pyaps3, read CDS key from $HOME directory, in addition to pyaps3/model.cfg file, so that users does not need to go to the interpreter directory for account setup.

  • 5. arg-less main() and a work function. See the comment below for details.

@yunjunz yunjunz changed the title mintpy adjustments for conda-forge mintpy adjustments to simplify conda-forge recipe Aug 30, 2021
@yunjunz
Copy link
Member Author

yunjunz commented Sep 1, 2021

@jhkennedy could you expand more on the 3rd topic?

The code does not rely on the MINTPY_HOME environment variable, I made sure of that a while ago. So it should be trivia to remove from the installation note if needed; I kept it there for convenience purposes only.

@jhkennedy
Copy link
Collaborator

@yunjunz yep! I'll expand soon; I'm just waiting for the conda-forge PR to go through so I can point to specific things in the feedstock.

@jhkennedy
Copy link
Collaborator

jhkennedy commented Sep 7, 2021

So overall, I'm coming from the perspective of "I want my build/installation tool to do as much of the work as possible." This lets the tool handle all the different OS/machine nuances and me to focus just on telling it want I want it to do.

MintPy, nicely, could be built/installed entirely by pip, and could qualify as a noarch python package on conda-forge with the above changes. (meaning it doesn't need OS special handling and therefore only needs a single build of Mintpy instead of a python-version-by-supported-OS matrix of builds.)

To illustrate, I'll go over how the current MintPy v1.3.1 recipe doesn't qualify because it fails these criteria:

  1. No activate scripts

  2. No OS-specific build scripts -- this is tightly coupled to

    • scripts argument in setup.py is not used
    • If console_scripts entry_points are defined in setup.py or setup.cfg, they are also listed in the build section of meta.yaml

    because MintPy expects things to be run on the command line/console

1. No activate scripts

The current activate script does two things.

  1. Exports MINTPY_HOME
  2. Adds the mintpy package directory to PATH so that the scripts, which reside inside MintPy can be called on the command line

Since (1) was fixed in #653 , that only leaves (2), which is done because most of the python modules inside of MintPy are also console scripts, and console scripts need to be available on PATH in order to be discovered/executed. Since MintPy isn't informing pip about them, pip can't handle making them discover-able and executable for MintPy.

Note: to support windows with this current setup, we'd need to also write .bat activate/deactivate scripts, and a windows build.bat script which installs them.

2. No OS-specific build scripts

For linux and mac, MintPy needs this build script which does:

  • A pip install
  • insures all the internal scripts are executable
  • Installs the activate/deactivate scripts
    • which adds those scripts to PATH, as discussed above

So, really, everything except for executing the the pip install command is to handle making available the executable console scripts.

Note: to support windows with this current setup, we'd need to also write a windows build.bat script that does these steps for windows, and handles windows shells and windows bash consoles.

However, pip can do all that work for MintPy! There are two ways to tell pip "make this script available":

  1. Use the old distutils scripts argument in setup.py, or
  2. use console_scripts entry_points in setup.py

(1) is not recommended anymore as it's rather hard to achieve cross-platform compatibility (it just straight copies the files into the "bin" location) and therefore can't be used for noarch packages.

console_scripts entry_points (2) leverage the whole pip toolchain and achieve cross platform compatibility by creating wrapper scripts (.exe files, for example, on Windows) that call the a specified function out of a package module (example below).

In the end, converting all the executable scripts (and the current setup.py scripts) to entrypoints means the MintPy recipe will no longer need the activate/deactivate scripts or the build scripts, and everything can be reduced to a single script: "{{ PYTHON }} -m pip install . -vv" in the build section of the recipe.yaml.


console_scripts entry_points example

Entrypoints are definitely weird, and I think it typically works best to explain them with an example of what all happens with them. I'll use MintPy/remove_hdf5_dataset.py as the example here as it's the shortest executable module in MintPy.

If I wanted to make a console_scripts entry_points for remove_hdf5_dataset.py with the same interface (meaning a remove_hdf5_dataset.py script available on PATH I can call), I'd need to:

1. Define the entrypoint in setup.py

This would look like

from setuptools import find_packages, setup

setup(
   ...
   entry_points={'console_scripts': [
       'remove_hdf5_dataset.py = mintpy.remove_hdf5_dataset:main',
       ]
   },
   ...
)

then, when pip install ... is called, pip will create a wrapper script called remove_hdf5_dataset.py which calls the main function (with no arguments!) in the mintpy.remove_hdf5_dataset module.

In a conda environment, that script will be located at ${CONDA_PREFIX}/bin/remove_hdf5_dataset.py and will have two functional lines:

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(load_entry_point('mintpy', 'console_scripts', 'remove_hdf5_dataset.py')())

which is "set sys.argv[0] to my entrypoint name, load the entrypoint function, and execute it".

2. Slightly modify MintPy/remove_hdf5_dataset.py.

Edit: This is only necessary if you want to support progromatic workflows (in python code/interpreter) with the same interface as in the console. Right now, it looks like all the executable modules in MintPy have a main function that excepts optional parameters and could be registered as an entrypoint. Think of this section as a "nice to have".

To make that work well, it's best to have a main function that takes no arguments and have argparse always look at sys.argv when main is called. That's achieved pretty easily by always having main be an entrypoint function, and a "work" function named the same (or similar) to the entrypoint that takes the same arguments. So that'd mean MintPy/remove_hdf5_dataset.py might instead look like (I did not check this!):

import argparse
import os
import sys

import h5py

from mintpy.utils import writefile

_EXAMPLE = """Example:
  remove_hdf5_dataset.py  ifgramStack.h5  unwrapPhase_phaseClosure
  remove_hdf5_dataset.py  ifgramStack.h5  unwrapPhase_phaseClosure  unwrapPhase_bridging
  remove_hdf5_dataset.py  velocity.h5     velocityStd
"""


def remove_hdf5_dataset(file: str, dset: list) -> str:
    with h5py.File(file, 'r') as f:
        dset_list = list(f.keys())
        
    if any(i not in dset_list for i in dset):
        raise ValueError(f'input dataset does not exist: {dset}\navailable datasets:\n{dset_list}')
        
    file = writefile.remove_hdf5_dataset(file, dset, print_msg=True)
    print('Done.')
    return file


def main():
    parser = argparse.ArgumentParser(description='Remove an existing dataset from HDF5 file',
                                     formatter_class=argparse.RawTextHelpFormatter,
                                     epilog=EXAMPLE)

    parser.add_argument('file', type=str, help='HDF5 file of interest')
    parser.add_argument('dset', type=str, nargs='+', help='dataset to be removed.')
    args = parser.parse_args()
    
    if os.path.splitext(args.file)[1] not in ['.h5', '.he5']:
        raise ValueError(f'input file is not HDF5: {args.file}')
    
    remove_hdf5_dataset(**args)
 

So now you have a main function that can be called via the entrypoint and a "worker" function with the same interface which can be used in the python interpreter.

A note on development workflows

I do not include a

if __name__ == `__main__`:
    main()

in my refactor of MintPy/remove_hdf5_dataset.py because with entrypoints it doesn't matter if the package modules are executable or not -- the wrapper script is the only thing that needs to be executable, and pip handles that.

In my opinion, developers should also use pip (call it package-first development), but install things into their development environment in "editable" mode so that when they change the source, those changes are reflected in the entrypoints. For MintPy that'd look like:

git clone git@github.com:insarlab/MintPy.git
cd MintPy
conda env create -f docs/environment.yml
python -m pip install -e .

And that's it! There is one caveat: if the package structure significantly changes (e.g., added new entrypoints), the python -m pip install -e . command will need to be rerun. In general, developers can make sure everything is up to date/ready when pulling new code or checking out branches by running:

conda env update -f docs/environment.yml
python -m pip install -e .

But those only need to be run if dependencies or package structure changes.

You can keep the current workflow where people add the mintpy directory to their PATH, include the if __name__ ==..., and keep the modules executable, but I wouldn't recommend it, except as a transition to a package-first development workflow.

Final thoughts

Overall, it's a change to development workflows and more for developers to understand/learn so that distributing across platforms is easier. It's definitely not required to do, and many successful packages maintain complicated recipes for many reasons, including it's too big of a change for developers or making the changes is too large of a development burden. Those are definitely valid decision that MintPy could make.

@jhkennedy
Copy link
Collaborator

Oh, and I should say, the full refactor (arg-less main and a work function) isn't needed -- it's a nice to have to support the same workflows both programmatically and in the console.

Since all the minty modules have the main function with an optional argument, simply registering the entrypoints will work just fine!

@yunjunz
Copy link
Member Author

yunjunz commented Sep 8, 2021

Thank you @jhkennedy for the detailed explanation! Registering the entrypoints in setup.py sounds good to me.

Regarding the refactoring, it seems a lot of changes. I will think about this more this weekend. Having a noarch version sounds great to me. My current major concern is that the remove_hdf5_dataset.main() style of function calls are heavily used in mintpy python scripts and notebooks, we need to make sure they won't break.

@jhkennedy
Copy link
Collaborator

jhkennedy commented Sep 8, 2021

@yunjunz Yep! I think the heavier refactoring is just a "think about it for future development" -- I've actually got a PR started that does all the entrypoints and all but these worked^ as-is:

'add_attribute.py = mintpy.add_attribute:main',
'multi_transect.py = mintpy.multi_transect:main',
'save_gbis.py = mintpy.save_gbis:main',
'tropo_pyaps.py = mintpy.tropo_pyaps:main',
'tsview.py = mintpy.tsview:main',

It looks like it won't be a heavy lift to modify those to work either.


^ By worked, I mean I could run <entrypoint> --help in bash -- that's as far as I've taken it.

@yunjunz
Copy link
Member Author

yunjunz commented Sep 8, 2021

Awesome. Please feel free to submit your PR and we could do more testing on the entrypoints setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants