Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add codegen prototype to _flux #5393

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Aug 16, 2023

Problem: we cannot autocomplete python modules and need to use getattr to derive a function existence
Solution: prepare json dumps of functions from cffi lib/ffi directly for later use

This is a very basic setup that is run as an additional generation command in the Makefile of _flux to generate a codegen/ directory with *.json files that match the various so names:

$ tree .
.
├── _core.json
├── _hostlist.json
└── _idset.json

This needs a lot of further discussion - namely that we don't do parsing for constants (probably don't need to, and I shouldn't add them to the generated json) and then:

  1. We have several variadic functions that we cannot load a type for ctype 'va_list' has incomplete type an example is here
  2. the _rlist.so doesn't seem to parse, I suspect for the same reasons we can't package it?
Traceback (most recent call last):
  File "codegen.py", line 137, in <module>
    main()
  File "codegen.py", line 132, in main
    return generate_codegen_json(sofiles, args.out)
  File "codegen.py", line 109, in generate_codegen_json
    result = so.codegen()
  File "codegen.py", line 60, in codegen
    module = self.load()
  File "codegen.py", line 52, in load
    module = importlib.import_module(self.module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: while loading _flux._rlist: failed to import ffi, lib from _flux._hostlist
make[4]: *** [Makefile:1107: codegen/_hostlist.json] Error 1
make[4]: Leaving directory '/workspaces/core/src/bindings/python/_flux'
make[3]: *** [Makefile:500: all-recursive] Error 1
make[3]: Leaving directory '/workspaces/core/src/bindings/python'
make[2]: *** [Makefile:500: all-recursive] Error 1
make[2]: Leaving directory '/workspaces/core/src/bindings'
make[1]: *** [Makefile:516: all-recursive] Error 1
make[1]: Leaving directory '/workspaces/core/src'
make: *** [Makefile:595: all-recursive] Error 1

Perhaps we need to somehow load two of the .so at once for that to work?

  1. There are some that just don't parse, e.g., (these are from core)
Issue with getting function flux_local_exec_ex, skipping
Issue with getting function flux_rexec_ex, skipping

I'm attaching json dumps of current output, which include names for everything (even if we cannot parse). We can discuss the issues above, and if we have enough here to tackle code generation. If this is a flaky approach, we shouldn't pursue it. On the other hand, if we can be sure about even a subset, we could add logic for them to the wrapper.py so at least some functions are exposed for discovery, etc. If we decide to pursue it, we would want to discuss and resolve issues above, and then talk about generation strategies for the classes.

_core.json.txt
_idset.json.txt
_hostlist.json.txt

@chu11
Copy link
Member

chu11 commented Aug 16, 2023

apologies as there's probably an advanced component here I don't know (search terms I'm using in Google is non optimal). Do these get installed somewhere and when we run python3 in shell interpreter mode, they get slurped up and tab completion can show stuff?

@vsoch
Copy link
Member Author

vsoch commented Aug 16, 2023

Oh that’s a step 2 - this is only a step 1 to dump out the metadata we need for that. We haven’t discussed a strategy for that yet so if you have ideas please share!

@vsoch vsoch force-pushed the add-python-discovery branch 4 times, most recently from 6dc7e24 to 6446bd1 Compare August 16, 2023 04:14
@vsoch
Copy link
Member Author

vsoch commented Aug 16, 2023

Here are a few examples of generated wrapper script classes:

core.py.txt
hostlist.py.txt
idset.py.txt

all-local: lib-copy-vpath ${fluxpyso_CODEGENJSON} ${fluxpyso_WRAPPERS}

${fluxpyso_CODEGENJSON}:
$(AM_V_GEN)$(PYTHON) codegen.py dump --out $(srcdir)/codegen ${fluxpyso_CODEGENINPUT}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getting into trying this out, this line breaks with out of source builds (tries to use codegen.py from the build directory). Think there's another one below the same.

Suggested change
$(AM_V_GEN)$(PYTHON) codegen.py dump --out $(srcdir)/codegen ${fluxpyso_CODEGENINPUT}
$(AM_V_GEN)$(PYTHON) $(top_builddir)/src/bindings/python/codegen.py dump --out $(srcdir)/codegen ${fluxpyso_CODEGENINPUT}

$(AM_V_GEN)$(PYTHON) $(top_builddir)/src/bindings/python/codegen.py dump --out $(srcdir)/codegen ${fluxpyso_CODEGENINPUT}

${fluxpyso_WRAPPERS}: ${fluxpyso_CODEGENJSON}
$(AM_V_GEN)$(PYTHON) $(top_builddir)/src/bindings/python/codegen.py generate --out $(top_srcdir)/src/bindings/python/flux/wrapper ${fluxpyso_CODEGENJSON}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crud, sorry, I made a mistake on this, it should be top_srcdir rather than top_builddir.

Copy link
Member Author

@vsoch vsoch Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha yeah was just about to comment! Looks like we made a franken-path:

/home/runner/work/flux-core/flux-core/src/bindings/python/_flux/../../../../src/bindings/python/codegen.py'

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

I'm not sure how to debug this because the build worked in my Docker development environment.

@trws
Copy link
Member

trws commented Aug 18, 2023

This looks like a good first step. Next will be to start doing the more
complete method generation, and the filtering. Take hostlist for example, right
now it's grabbing in free, which we really don't want. Getting the regex
matching will clean up a lot of the stuff because we've already put that on the
Wrapper generation in the code. In the end I think we want it to generate
methods that run generally like this (if they need everything, and we do python3
typing because why not make completion as awesome as possible, there will be
much simpler ones too):

edit: refactoring to remove common code

hostlist_struct = cffi.typeof("struct hostlist *")
hostlist_type = Union["HostlistWrapper", hostlist_struct]
str_like = Union[str, bytes, cffi.typeof("char *")]
int_like = Union[int, cffi.typeof("int")]

def maybe_unwrap_handle(arg):
  if arg is None:
    return ffi.NULL
  return arg.handle if isinstance(arg, WrapperBase) else arg

def translate_pointer(arg):
  if arg is None:
    return ffi.NULL
  if isinstance(arg, str):
    return arg.encode("utf-8", errors="surrogateescape")

def raise_from_errno(errnum):
      raise EnvironmentError(
          errnum,
          "{}: {}".format(
              errno.errorcode[errnum]
              if errnum in errno.errorcode
              else "Errno" + str(errnum),
              os.strerror(errnum),
          ),
      )

@contextmanager
def invalidargs(*args, **kwds):
    try:
        yield true
    except TypeError as err:
      raise InvalidArguments(
        self.name, ffi.getctype(self.function_type), args_in
      ) from err

class HostlistWrapper:
  def hostlist_append(self, hostlist0: hostlist_type, char1: str_like) -> int:
    with invalidargs():
      # generate call, capturing return, translating arguments and adding handle
      # as necessary
      res = lib.hostlist_append(self._handle, maybe_unwrap_handle(hostlist0), translate_pointer(char1))
    # generate error translation to exception, since this returns int using that
    # convention
    if res < 0:
      raise_from_errno(ffi.errno)

    return res

  def hostlist_nth(self, hostlist0: hostlist_type, int1: int_like) -> int:
    with invalidargs():
      # generate call, capturing return, translating arguments and adding handle
      # as necessary
      res = lib.hostlist_append(self._handle, int1)

    # since the return from C is a "char *", translate the string with cffi
    res = ffi.string(result)

    # generate error translation to exception, since this returns a pointer using that
    # convention, with translation we can just check for None
    if res is None:
      raise_from_errno(ffi.errno)

    return res

Note how there are less translations and checks on the call in nth, that's part
of why I like this approach. Normally all that checking has to happen even if
we don't need it, drives me a bit nuts, but if we generate it knowing the
signature we can just skip it. This isn't working code, and there may have to
be extra work to make sure that you get the right ffi object (the current
wrapper actually grabs the ffi from the object that calls the bound method) but
this should be pretty close to what we're currently doing dynamically.

Functions would be similar, but without the automatic handle arg first, since they have no "self" to grab it from. Could generate it all as functions and then call the functions from the class actually if that would make it easier to factor.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

okay actually this recent change broke it - let me debug locally again. But I guess changing back to what I originally had is still wrong.

@trws
Copy link
Member

trws commented Aug 18, 2023

On the errors, I'm not sure what's up, but when I built it locally it didn't generate the files the first time, only on the second make, not sure why.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

On the errors, I'm not sure what's up, but when I built it locally it didn't generate the files the first time, only on the second make, not sure why.

It doesn't even build for me with the new changes, so I'm still debugging that again.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

@trws if it really takes that much Python (still) to get this working, I'm not convinced this is really much of an improvement on what we have. The complexity it adds is maybe not worth it imho. If it were the case of being able to generate simple template files from the discovered classes (to add the wrapper) then it would. But needing to write yet-another-hairball of custom code, but this time requiring generation via codegen/templating, this is a suboptimal approach and any incremental improvements it might add aren't worth it imho.

Pinging @grondo @garlick for second opinion.

Problem: we cannot autocomplete python modules and need to use getattr
to derive a function existence
Solution: prepare json dumps of functions from cffi lib/ffi directly for later use

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
The flux/wrapper directory holds a custom wrapper for each .so
library type, and we can use it (with the base Wrapper as the
parent class) to more quickly provide an interface to
existing functions.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Aug 18, 2023

Codecov Report

Merging #5393 (401889d) into master (9d4c1f3) will decrease coverage by 4.78%.
The diff coverage is n/a.

❗ Current head 401889d differs from pull request most recent head 1e1b9ea. Consider uploading reports for the commit 1e1b9ea to get more accurate results

@@            Coverage Diff             @@
##           master    #5393      +/-   ##
==========================================
- Coverage   83.54%   78.76%   -4.78%     
==========================================
  Files         470      440      -30     
  Lines       78567    75615    -2952     
==========================================
- Hits        65635    59558    -6077     
- Misses      12932    16057    +3125     
Files Changed Coverage Δ
src/bindings/python/flux/wrapper/__init__.py 78.28% <ø> (ø)

... and 91 files with indirect coverage changes

@trws
Copy link
Member

trws commented Aug 18, 2023

What do you mean hairball of custom code? The wrappers would be auto-generated based on the type signature, yes the result is a non-trivial amount of python but we would be able to actually look at it rather than the mass of late-bound callables we have to try to inspect now. I mean, it's not simpler than the current approach really, but it should make it a lot easier to see what its doing to be able to look at what's generated.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

I'm looking at this block: #5393 (comment)

And realizing instead of "the classes get generated cleanly" it's more "the classes are generated with additional custom logic depending on the type" and I think this is a step in the wrong direction because it adds more complexity to the generation. Right now we have something that is working in wrapper.py, albeit not with discoverability, but it doesn't require parsing .so's, retrieving signatures, and then custom writing classes per type. It's a red flag for me.

@trws
Copy link
Member

trws commented Aug 18, 2023

Fair enough I suppose. For what it's worth, it does actually require parsing the sos, retrieving signatures, and custom composing objects to form function wrappers that implement the same logic as the generated code above. That's actually where I took the code from, it's literally what it does right now but moving the extra conditionals based on the signature out of the wrapper and into the generator.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

A compromise could just be hard coding the predictable base classes and then writing the files as I am now to use them. But would the improvement in usability offset the added complexity? It seems like a single, static file with a wrapper base that is consistent and works is better than a three step generation process that would introduce more opportunities for bugs.

@trws
Copy link
Member

trws commented Aug 18, 2023

Yeah, maybe true. Still wish I could make it easier to explore though. Might be able to make the attributes appear when listed or something, but without generating something there's no way to... waaaaait a minute... what if instead of generating the actual classes, we keep the current wrappers and generate stub files? Then it's literally just generating names and types, but gives us all the type checking and completion and all of that.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

lol! What in the world is a stub file?! Is that mypi specific?

To step back, if we keep wrapper.py as the base, is there a sane way to create the top level function (as we are doing here) but then call the same underlying logic we have, but with the "find the library name" removed? That's what I was trying to do here (hence calling self.lib.<function_name> although that needs a bit more complexity to it.

@trws
Copy link
Member

trws commented Aug 18, 2023

No it's not, it's how types are provided to mypy, pytypes, pyrite, pylance, the whole universe (almost) of python type checkers and LSPs. That's why it actually might make sense, you get a lot of discoverability (at least with things other than the command repl) without adding anything the interpreter even looks at at runtime. They're files like this one on typeshed that just have signatures and no code.

Now, to your question about whether we could use the current logic... yes, we could find a way to do that. We'd need to, when the class is created or on first call, have it generate a FunctionWrapper instance for each of the functions. They could be simplified slightly since they're being called in a known way, but then each call would basically be self._<memberFunctionWrapper>(args...). There would be little to no performance win (maybe a loss actually), but it would work.

@vsoch
Copy link
Member Author

vsoch commented Aug 18, 2023

Now, to your question about whether we could use the current logic... yes, we could find a way to do that. We'd need to, when the class is created or on first call, have it generate a FunctionWrapper instance for each of the functions. They could be simplified slightly since they're being called in a known way, but then each call would basically be self._(args...).
Could you give an example of that? Right now I have:

def signal_handler_wrapper(self, fluxreactor0, fluxwatcher1, int2, void3):
        """Wrapper to C function signal_handler_wrapper"""
        return getattr(self.lib, "signal_handler_wrapper")(fluxreactor0, fluxwatcher1, int2, void3)

You are saying we would have a private derivative of that with FunctionWrapper?

There would be little to no performance win (maybe a loss actually), but it would work.

Wait then, if it's a loss (or no improvement) then why do it, period?

@trws
Copy link
Member

trws commented Aug 18, 2023

That's why I originally showed the wrapper generated as inline code. If we generate the code directly, then all the dynamic checks the wrapper has to do go away, that's how we would get a performance improvement. If we're using the same wrapper, we don't get that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants