Use `flatten_args` for `_impl_caches` in templates #9457

dlee992 · 2024-02-22T00:46:58Z

This reverts commit cc9ab81.

dlee992 · 2024-02-23T01:04:33Z

Numba compilation still has at least 2 issues:

sometimes, if the return type of a jitted/overloaded function is Tuple, numba forcily doesn't cache this function into its cache dict.
different callers call the same overloaded function with the same type signature, the overloaded function still compiles twice, since they have different compilation flags, e.g., no_cpython_wrapper=True or False, and fields in flags are in different order. This makes sense in some cases.

In this PR, I only solved one issue (different from above 2): if a overloaded function is compiled in caller's typeinfer pass, then I can make sure it won't recompile in caller's native lowering pass. (I hope so, but not pretty sure)

…g stuff

dlee992 · 2024-02-27T00:12:27Z

An example to show why we need this PR:

from os import environ

environ["NUMBA_CHROME_TRACE"] = "3.json"
environ["NUMBA_DUMP_LLVM"] = "1"

from numba import njit
from numba.core import types
from numba.typed import Dict


@njit
def goo(k, j):
    d0 = Dict.empty(types.int64, types.int64)
    d = Dict.empty(types.int64, value_type=types.int64)
    d1 = Dict.empty(key_type=types.int64, value_type=types.int64)
    d[1] = 1
    d[2] = 2
    d[3] = 3
    _ = d.get(k)
    d.pop(k)
    d.pop(j, None)
    del d[3]

@njit
def joo(k):
    d = Dict.empty(types.int64, value_type=types.int64)
    d[1] = 1
    return d.pop(k, None)

print(goo(1, 2))
print(joo(0))

without this patch, we compile these modules:

---------LLVM DUMP <function descriptor 'impl_new_dict.<locals>.imp$3'>---------
-------LLVM DUMP <function descriptor 'typeddict_empty.<locals>.impl$2'>--------
-------LLVM DUMP <function descriptor 'typeddict_empty.<locals>.impl$4'>--------
-------LLVM DUMP <function descriptor 'typeddict_empty.<locals>.impl$5'>--------
----------LLVM DUMP <function descriptor 'ol_hasattr.<locals>.impl$9'>----------
--------LLVM DUMP <function descriptor 'ol_getattr_2.<locals>.impl$10'>---------
--------------LLVM DUMP <function descriptor 'process_return$13'>---------------
----------------LLVM DUMP <function descriptor '_long_impl$14'>-----------------
----------LLVM DUMP <function descriptor 'int_hash.<locals>.impl$12'>-----------
--------LLVM DUMP <function descriptor 'ol_defer_hash.<locals>.impl$11'>--------
--------LLVM DUMP <function descriptor 'hash_overload.<locals>.impl$7'>---------
---------LLVM DUMP <function descriptor 'impl_setitem.<locals>.impl$6'>---------
----------LLVM DUMP <function descriptor 'impl_get.<locals>.impl$16'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$17'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$18'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$20'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$21'>-----------
--------LLVM DUMP <function descriptor 'impl_delitem.<locals>.impl$19'>---------
-------LLVM DUMP <function descriptor 'typeddict_empty.<locals>.impl$22'>-------
----------LLVM DUMP <function descriptor 'impl_get.<locals>.impl$23'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$24'>-----------
--------------------LLVM DUMP <function descriptor 'goo$1'>---------------------
--------------------LLVM DUMP <function descriptor 'joo$25'>--------------------

with this patch, we compile these modules:

---------LLVM DUMP <function descriptor 'impl_new_dict.<locals>.imp$3'>---------
-------LLVM DUMP <function descriptor 'typeddict_empty.<locals>.impl$2'>--------
----------LLVM DUMP <function descriptor 'ol_hasattr.<locals>.impl$7'>----------
---------LLVM DUMP <function descriptor 'ol_getattr_2.<locals>.impl$8'>---------
--------------LLVM DUMP <function descriptor 'process_return$11'>---------------
----------------LLVM DUMP <function descriptor '_long_impl$12'>-----------------
----------LLVM DUMP <function descriptor 'int_hash.<locals>.impl$10'>-----------
--------LLVM DUMP <function descriptor 'ol_defer_hash.<locals>.impl$9'>---------
--------LLVM DUMP <function descriptor 'hash_overload.<locals>.impl$5'>---------
---------LLVM DUMP <function descriptor 'impl_setitem.<locals>.impl$4'>---------
----------LLVM DUMP <function descriptor 'impl_get.<locals>.impl$14'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$15'>-----------
----------LLVM DUMP <function descriptor 'impl_pop.<locals>.impl$16'>-----------
--------LLVM DUMP <function descriptor 'impl_delitem.<locals>.impl$17'>---------
--------------------LLVM DUMP <function descriptor 'goo$1'>---------------------
--------------------LLVM DUMP <function descriptor 'joo$18'>--------------------

The diff is without it, our main will compile typeddict_empty 4 times and impl_pop 5 times. with this patch, it compiles typeddict_empty 1 time and impl_pop twice (for different default types, Omitted(default=None) and types.none)
cc @guilhermeleobas

gmarkall · 2024-02-27T11:33:58Z

/azp run

azure-pipelines · 2024-02-27T11:34:09Z

Azure Pipelines successfully started running 1 pipeline(s).

guilhermeleobas · 2024-02-27T16:17:13Z

numba/cpython/randomimpl.py

@@ -1952,7 +1952,7 @@ def getitem(a, a_i):
        raise TypeError("np.random.choice() first argument should be "
                        "int or array, got %s" % (a,))

-    if size in (None, types.none):
+    if size in (None, types.none) or isinstance(size, types.Omitted):


Not a proper review, but this ought to be cgutils.is_nonelike(size)?

Yes, I also notice this API to check it, will refactor.

dlee992 · 2024-02-29T22:22:40Z

I didn't change this file, but CI reports the flake8 errors in it. Emm...

numba/tests/test_unicode.py:2670:27: E226 missing whitespace around arithmetic operator

btw, the last real failure is about overriding @overload flags in a test, try to understand the test case first..

dlee992 · 2024-03-04T22:54:48Z

@sklam I'm not sure if I choose the right way to fix them. Maybe the real fix should happen at the start of "funny" things, rather than at the end of them. I choose the latter way in this PR.

The real issue is the inconsistency between _impl_caches and the underlying compilation for the same function. The former has many issues, while the latter is always "perfect".

sklam

This "review" is really for developers. The PR points out several problems in the overload template code. Also, it would be better if we don't have to handle None/NoneType/Omitted as this PR is fixing.

sklam · 2024-03-12T15:41:41Z

numba/core/typing/templates.py

+        dict_flags = dict(flags.options) if flags is not None else {}
+        for key, value in self._jit_options.items():
+            if key in dict_flags:
+                dict_flags["key"] = value


Dev note: check why flags are not correct; e.g. no_cpython_wrapper

sklam · 2024-03-12T15:43:03Z

numba/core/typing/templates.py

+        # FIXME: what's the real thing need to do is adding Ommitted to the args
+        # check the pyfunc signature, if exist some kwargs, e.g., default=None,
+        # check whether kws specify them, if not,
+        # please add Omitted(default=None) to args
+        ov_sig = inspect.signature(self._overload_func)
+        if len(args) + len(kws) < len(ov_sig.parameters):
+            for param in list(ov_sig.parameters.values())[len(args):]:
+                name = param.name
+                default = param.default
+                if default != inspect.Parameter.empty and name not in kws:
+                    # add numba type of this param into kws
+                    # it must be an Omitted type with a default value
+                    kws[name] = types.Omitted(default)


Dev note: we should normalize signatures

sklam · 2024-03-12T15:45:26Z

numba/core/typing/templates.py

+        if sig is not None:
+            flatten_args = sig.args
+
+            if cache_key is not None:
+                self._impl_cache[cache_key] = disp, flatten_args


Dev note: needs investigation. sometimes sig.args is different from the given args.

dlee992 · 2024-03-28T16:48:18Z

interesting failure:

Traceback (most recent call last):
  File "/home/vsts/work/1/s/numba/tests/test_compiler_flags.py", line 40, in test_fastmath_in_overload
    self.assertEqual(b, "Has fastmath")
AssertionError: 'No fastmath' != 'Has fastmath'
- No fastmath
? ^^
+ Has fastmath
? ^^^

It could inspire me how to fix flags more decently.

dlee992 · 2024-03-28T19:14:15Z

added a naive release doc and fixed the failed test case.

TODO: add some new tests to check if this PR can deduplicate the expected unnecessary compilations.

dlee992 · 2024-03-28T21:19:38Z

this failure (numba.numba (Linux py310_np123)) is a CI infra issue, not code issue.

dlee992 marked this pull request as draft February 22, 2024 02:56

first try

4527b41

dlee992 force-pushed the fix_9440 branch from 494a2f6 to 4527b41 Compare February 22, 2024 17:36

dlee992 added 16 commits February 22, 2024 13:34

add missing Omitted type for kws when _build_impl

57fbdc5

skip the same size of args in parameters

0163e9e

fix np.arange overload to handle Omitted

1e8ae9d

check_is_integer also accepts types.Omitted

2359b34

trim can also be Omitted

4e15f91

num can also be Omitted

708151b

k in numpy_diagflat can be omitted

b1319a4

size in choice can be Omitted

e2bac26

range in histogram can be Omitted

d8ad17c

more fixes

1200b0d

add one more

3c722ad

more more

2f0c3e0

flake8 fix

efded85

more fix

2c069dc

also convert Omitted to its value in cache_key

cc9ab81

Revert "also convert Omitted to its value in cache_key"

75704c5

This reverts commit cc9ab81.

dlee992 added 5 commits February 23, 2024 13:38

don't use flags in cache_key

fab696c

don't change targetconfig

63ee4b2

update the flags.options as a dict with _jit_options, and remove debu…

ade4b57

…g stuff

flags.options could be None

c181b00

flags could be None

348822f

gmarkall added the 2 - In Progress label Feb 27, 2024

guilhermeleobas reviewed Feb 27, 2024

View reviewed changes

sig could be None, why?

017b630

sklam mentioned this pull request Mar 4, 2024

overload_methoded function shouldn't set no_cpython_wrapper as False #9462

Open

sklam reviewed Mar 12, 2024

View reviewed changes

sklam added the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Mar 12, 2024

dlee992 added 2 commits March 28, 2024 13:10

add a release note

1cd11ef

handle the dict flags correctly

b4bdba4

dlee992 marked this pull request as ready for review March 28, 2024 21:18

Merge branch 'main' into fix_9440

222cf5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `flatten_args` for `_impl_caches` in templates #9457

Use `flatten_args` for `_impl_caches` in templates #9457

dlee992 commented Feb 22, 2024 •

edited

dlee992 commented Feb 23, 2024 •

edited

dlee992 commented Feb 27, 2024

gmarkall commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

guilhermeleobas Feb 27, 2024

dlee992 Feb 27, 2024

dlee992 commented Feb 29, 2024 •

edited

dlee992 commented Mar 4, 2024 •

edited

sklam left a comment

sklam Mar 12, 2024

sklam Mar 12, 2024

sklam Mar 12, 2024

dlee992 commented Mar 28, 2024

dlee992 commented Mar 28, 2024 •

edited

dlee992 commented Mar 28, 2024

Use flatten_args for _impl_caches in templates #9457

Are you sure you want to change the base?

Use flatten_args for _impl_caches in templates #9457

Conversation

dlee992 commented Feb 22, 2024 • edited

dlee992 commented Feb 23, 2024 • edited

dlee992 commented Feb 27, 2024

gmarkall commented Feb 27, 2024

azure-pipelines bot commented Feb 27, 2024

guilhermeleobas Feb 27, 2024

Choose a reason for hiding this comment

dlee992 Feb 27, 2024

Choose a reason for hiding this comment

dlee992 commented Feb 29, 2024 • edited

dlee992 commented Mar 4, 2024 • edited

sklam left a comment

Choose a reason for hiding this comment

sklam Mar 12, 2024

Choose a reason for hiding this comment

sklam Mar 12, 2024

Choose a reason for hiding this comment

sklam Mar 12, 2024

Choose a reason for hiding this comment

dlee992 commented Mar 28, 2024

dlee992 commented Mar 28, 2024 • edited

dlee992 commented Mar 28, 2024

Use `flatten_args` for `_impl_caches` in templates #9457

Use `flatten_args` for `_impl_caches` in templates #9457

dlee992 commented Feb 22, 2024 •

edited

dlee992 commented Feb 23, 2024 •

edited

dlee992 commented Feb 29, 2024 •

edited

dlee992 commented Mar 4, 2024 •

edited

dlee992 commented Mar 28, 2024 •

edited