Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept list vec args in genned code by default #251

Open
wants to merge 5 commits into
base: namespace_package_simple
Choose a base branch
from

Conversation

bradley-solliday-skydio
Copy link
Collaborator

By default, I mean if use_numba=False and reshape_vecs=True.

Previously, only ndarrays could be passed as vector args to generated code. However, lists in python are ubiquitous and natural, so it would be good for our generated functions to accept them.

So, this commit allows generated functions to accept lists for matrix arguments when the argument is either a row vector or a column vector.

Nested lists are not accepted to represent matrices because I simply did not consider that until right now. (Perhaps we should add that?)

PythonConfig.use_numba must be False because list arguments have been deprecated in numba (instead you should use numba.typed.List, but if you have to use that you might as well use numpy.ndarray as far as I can tell).

PythonConfig.reshape_vecs must be True because accepting lists requires conversion to an ndarray and reshaping, which is presumably more or less the entire thing meant to be avoided by setting reshape_vecs to False.

This change did involve mucking up a bit the python util.jinja type printing macros, as a sf.Matrix type should be rendered as an numpy.ndarray is it is a return type, reshape_vectors=False, use_numba=True, or is not a row or column vector, and rendered as T.Union[T.Sequence[float], numpy.ndarray] otherwise.

Comment on lines 40 to 45
{%- elif is_sequence(T_or_value) -%}
{%- if is_input -%}
T.Sequence[{{ format_typename(T_or_value[0], name, is_input) }}]
T.Sequence[{{ format_typename(T_or_value[0], name, is_input, available_classes) }}]
{%- else -%}
T.List[float]
{%- endif -%}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with the rest of the PR. I just noticed that it was out of date. If, for some reason, a method of, say, Rot3 returned a list of Rot3s, this change is needed to ensure the return type renders as T.Sequence[Rot3] instead of T.Sequence[sym.Rot3].

"point is expected to have length 2; instead had length {}".format(len(point))
)
point = numpy.array(point).reshape((2, 1))
elif point.shape == (2,):
point = point.reshape((2, 1))
elif point.shape != (2, 1):
raise IndexError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally a bunch of the other methods on this class (and the other geo classes, and probably camera classes as well) are now inconsistent in what types they take, e.g. retract below. We can maybe do this in a follow-up PR, but I think we should make sure these are consistent (either fix in an immediate follow-up or make an issue)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll make it part of this PR because I'd rather not have the potential for things to be in an inconsistent state (though probably as a separate PR to make it a bit easier to understand each change in isolation).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I added a second commit. It just updates the type annotations and removes the extra ValueError being raised. I explain all the changes in the commit message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to mention, but I didn't change from_storage to accept ndarrays of all shapes. This was a decision I wasn't too sure about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if @hayk-skydio has opinions on this, or on this PR in general (this is making Python functions accept Sequence[float] or ndarrays of shapes (1, N), (N, 1), or (N,) for vector arguments by default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now I'm confused - am I remembering wrong that at least one of these PRs made us accept (1, N)? I thought we decided against accepting row vectors and then I thought I remembered seeing us doing that here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person - we like accepting flat lists, flat ndarrays, or column-shaped ndarrays for column vector arguements, and 2d ndarrays for everything else. Would be good to have 1) a python helper function we can import from elsewhere to replace the logic in every generated function and 2) a type alias to simplify the declaration, like npt.ArrayLike, but those can happen in follow-ups

@@ -111,10 +116,12 @@
{%- if is_method and "self" not in spec.inputs -%}
@staticmethod
{% endif %}
{# Only accept list as vector inputs if not numba and we reshape vecs into ndarrays #}
{% set accept_list_vec = spec.config.reshape_vectors and not spec.config.use_numba %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably either call this accept_list_vec in both places or sequence_vecs in both places?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Called it sequence_vecs in both places.

len(vec), self.tangent_dim()
)
)
# type: (T.Union[T.Sequence[float], numpy.ndarray], float) -> {{ cls.__name__ }}
return ops.LieGroupOps.retract(self, vec, epsilon)

def local_coordinates(self, b, epsilon=1e-8):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update __mul__ below as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I simply overlooked it.

Comment on lines 241 to 239
elif (
hasattr(other, "__len__")
and hasattr(other, "__getitem__")
and not isinstance(other, str)
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs and hasattr(self, "compose_with_point") I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Thanks.

Defines the new function load_generated_function which is
meant to be a more user friendly version of `load_generated_package`
(which will be particularly more friendly should we no longer re-export
all sub-modules of generated packages).

Also, explains the arguments of `load_generated_package` in its
doc-string.

As an example of the expected usage of this function, for `co` a
`Codegen` object:
``` python3
generated_paths = co.generate_function()

func = load_generated_function(co.name, generated_paths.function_dir)
```

Tested in `test/symforce_codegen_util_test.py`
@bradley-solliday-skydio bradley-solliday-skydio changed the base branch from always_raise_IndexError to namespace_package_simple November 4, 2022 00:47
@bradley-solliday-skydio bradley-solliday-skydio force-pushed the namespace_package_simple branch 2 times, most recently from c07e7bd to 855d587 Compare November 4, 2022 01:48
Namespace packages are packages whose `__path__` (which tells the python
import system to look for sub-packages and sub-modules) has multiple
directories, meaning it can have portions spread out across multiple
directories.

Previously, `sym` and the other generated packages were not namespace
packages. This caused issues when generated python packages in the `sym`
namespace attempted to access, for example, `sym.Rot3` (which they might
do if the generated function returns a `sym.Rot3`). Since the generated
package itself was named `sym`, but `Rot3` was not defined locally, an
`AttributeError` would be raised.

While an alternative would have been to instead not use the name `sym`
for generated packages (using, say, `sym_gen` instead), for reasons I
didn't fully look into, we still want to generate our code within the
`sym` namespace (generating into `sym.gen` was considered as an option
but to do so would require the changes in this commit regardless).

While currently we haven't needed to generate any functions returning a
`sym` class in a package named `sym`, we intend to soon add a `sym.util`
package which will be used in a lot of places. That can't be done until
this namespace conflict is resolved.

Note, while a python2 compatible namespace package has multiple
`__init__.py` files for the top-level package spread across different
locations, only one of them will be executed. This makes it difficult to
re-export the contents of sub-modules into the top-level namespace.

The normal way to re-export a name is to write
``` python3
from .sub_module import name
```
However, since the sub-modules can be created dynamically, it is
impossible to re-export all names in this manner, as the first
`__init__.py` that is created has no way of knowing what names one might
want to re-export from subsequent modules.

It is possible to put all names one wishes to export in a standard file,
say `_init.py`, then dynamically search for such files and execute their
contents, but we considered the additional complexity to be too large of
a burden (as users would have a harder time understand their generated
code, and this would give future maintainers a hard time).

And so, we decided to simply stop re-exporting any names in the
`__init__.py`'s of generated code (kind of in the style of pep 420
python3 packages).

This makes loading a generated function more difficult if one uses
`codegen_util.load_generated_package`, as now simply importing a
generated package won't give you access to any of the package's
contents. However, this is what `codegen_util.load_generated_function`
is for, so hopefully the user experience shouldn't be too negatively
impacted.

The one exception to the general ban of re-exporting names is the `sym`
package, as we still wish to be able to do
``` python3
import sym

sym.Rot3.identity()
```
However, because all sub-modules we wish to export from the `sym`
package are known at code-gen time, allowing this is not difficult.
This only applies to names in the core `sym` package, and any additional
user generated code in the `sym` package will not be re-rexported in the
top-level namespace.

A user can prevent their package from being generated as a namespace
package by setting the `namespace_package` field of their `PythonConfig`
to `False`. This is useful in our testing as it is the generated code
being tested that is imported, not, for example, the checked in `sym`
package code which is being imported.

As a last note, I believe `pkgutil.extend_path` only checks for portions
of the package on the `sys.path`, and doesn't check for any portions
than can only be found by finders on the `sys.meta_path` (for example,
`symforce` itself is found by a finder on the `sys.meta_path` but not on
the `sys.path` during an editable pip install). I don't expect this
lapse to pose a problem, and addressing it immediately might just make
the `__init__.py`s look more complicated than they need to be, but if
this does become a problem, know that the situation can be partially
addressed trying to find the spec using the finders, and adding the
spec's `submodule_search_locations` if found to the `__path__`.
By default, I mean if `use_numba=False` and `reshape_vecs=True`.

Previously, only ndarrays could be passed as vector args to generated
code. However, lists in python are ubiquitous and natural, so it would
be good for our generated functions to accept them.

So, this commit allows generated functions to accept lists for matrix
arguments when the argument is either a row vector or a column vector.

Nested lists are not accepted to represent matrices because I simply did
not consider that until right now. (Perhaps we should add that?)

`PythonConfig.use_numba` must be `False` because list arguments have
been deprecated in numba (instead you should use `numba.typed.List`, but
if you have to use that you might as well use `numpy.ndarray` as far as
I can tell).

`PythonConfig.reshape_vecs` must be `True` because accepting lists
requires conversion to an ndarray and reshaping, which is presumably
more or less the entire thing meant to be avoided by setting
`reshape_vecs` to `False`.

This change did involve mucking up a bit the python `util.jinja` type
printing macros, as a `sf.Matrix` type should be rendered as an
`numpy.ndarray` is it is a return type, `reshape_vectors=False`,
`use_numba=True`, or is not a row or column vector, and rendered as
`T.Union[T.Sequence[float], numpy.ndarray]` otherwise.
The handwritten methods of the sym classes all already accepted lists
(in addition to ndarrays) because they were backed by the autogenerated
functions, which support both. So, I updated the type annotations
accordingly.

Also, since the auto-generated methods already have adequeate error
messages, I removed the hand-written ones as they didn't cover certain
corner cases well (like 2d row vectors), and in order to make them
complete, they would just be duplicates of what was autogenerated more
or less anyway.

Also, fixed the type signature of the camera cal __init__ methods. This
should have been done previously, but was overlooked.

Also made a small fix to the corresponding jinja template's indenting to
match the style we use elsewhere.

Also changed the geo methods multiplication methods to accomadate
sequences. Since the Sequence abstract base class is located at
`collections.Sequence` in python2.7 and `collections.abc.Sequence` in
python3.8 and the like, I decided to instead identity a sequence by
whether or not it implements `__getitem__` and `__len__`.

Also check that it's not a `str`. Note, it seems that in python2.7 I
should really be checking not that it is a `str`, but rather a
`basestr`, but `basestr`s don't existing in python3.8. I didn't think
this corner case is that important, and if such cases are important,
then we should probably rethink our strategy here.

Also, all when multiplying a geo type by a sequence (so list, tuple, or
something like that) a column 2d ndarray is returned. The idea behind
this is that it's a bit more consistent and predictable (though, I'm
open to other return types).
Did not move size checking logic for numba because
- the checking logic for numba is pretty short
- there were some technical difficulties in doing so that I didn't want
to get bogged down in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants