Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn users when converting models which can't be optimally accelerated #542

Open
AdamHillier opened this issue Oct 21, 2020 · 1 comment

Comments

@AdamHillier
Copy link
Contributor

We should work out a good way to raise warnings for models which won't convert in an 'optimal' way -- essentially any eggregious violation of our model optimisation guide. It's slightly complicated for a few reasons:

  • We need to decide where to raise warnings (in Larq, in the LCE converter)?
  • Many of the warnings we could raise are target-dependent, but the flatbuffer files produced by the LCE converter are target-independent.
  • There are different degrees of 'sub-optimal'. In the particular example you raise, when using the LCE optimized kernels on Aarch64, not being able to write bitpacked activations doesn't have a huge measurable latency impact (< 10%).
  • We don't want to spam users with lots of warnings about sub-optimal code, especially as there are legitimate use-cases where a user doesn't care if something will run slowly (for example, the user might be doing research into new BNN architecture designs and using patterns which aren't currently optimised in LCE, such as grouped convolutions, but could be implemented in the future).

Originally posted by @AdamHillier in #541 (comment)

@lkindrat-xmos
Copy link

* We need to decide where to raise warnings (in Larq, in the LCE converter)?

My two cents: I would try to decouple Larq from any conversion-related logic, including when to warn. (TensorFlow doesn't give you a warning either when you use an operator that cannot be efficiently converted by TFLite.)

* Many of the warnings we could raise are target-dependent, but the flatbuffer files produced by the LCE converter are target-independent.

How about a target argument to the converter, where one could specify "reference" or "Aarch64" (or others later), with the latter being the default (if desired) for compatibility? To make the flatbuffer target dependent, a metadata field with name target could be used. Then the runtime could check the intended target at initialization and abort/warn.

* We don't want to spam users with lots of warnings about sub-optimal code, especially as there are legitimate use-cases where a user doesn't care if something will run slowly (for example, the user might be doing research into new BNN architecture designs and using patterns which aren't currently optimised in LCE, such as grouped convolutions, but could be implemented in the future).

I think being able to specify a target would already help with this, but depending on how the warnings are logged, you could push to responsibility on the user to control what's logged:

  • explicit verbosity settings passed to convert (I generally don't like this approach)
  • if you can use logging (and even in some other cases), the user has the ability to fully control levels and handlers, and you could provide helpers like this:
with lce.converter_log_level('q'):
    lce.convert_keras_model(
        model_with_unsupported_op,
        target='reference',
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants