Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.concatenate fails trying to concatenate too many nested arrays #2881

Open
Superharz opened this issue Dec 8, 2023 · 1 comment
Open
Labels
bug The problem described is something that must be fixed

Comments

@Superharz
Copy link

Version of Awkward Array

2.5.0

Description and code to reproduce

Trying to concatenate too many nested arrays at once results in a crash.
In my case, too many means more than 129 arrays. So close to the int8 range.
As a workaround one can concatenate the arrays in an iterative way by using the concatenate function multiple times.

Minimal example:
This works fine:

a = ak.Array([[1]])
ak.concatenate([a for i in range(128)], axis = -1)

This fails:

a = ak.Array([[1]])
ak.concatenate([a for i in range(130)], axis = -1)

Traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File ~\miniforge3\envs\uni4\Lib\site-packages\awkward\contents\numpyarray.py:349, in NumpyArray._carry(self, carry, allow_lazy)
    348 try:
--> 349     nextdata = self._data[carry.data]
    350 except IndexError as err:

IndexError: index 2352008096080 is out of bounds for axis 0 with size 130

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\contents\numpyarray.py", line 349, in _carry
    nextdata = self._data[carry.data]
               ~~~~~~~~~~^^^^^^^^^^^^
IndexError: index 1867296635536 is out of bounds for axis 0 with size 130

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\_dispatch.py", line 62, in dispatch
    next(gen_or_result)
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\operations\ak_concatenate.py", line 61, in concatenate
    return _impl(arrays, axis, mergebool, highlevel, behavior, attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\operations\ak_concatenate.py", line 330, in _impl
    out = ak._broadcasting.broadcast_and_apply(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\_broadcasting.py", line 1027, in broadcast_and_apply
    out = apply_step(
          ^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\_broadcasting.py", line 1005, in apply_step
    return continuation()
           ^^^^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\_broadcasting.py", line 974, in continuation
    return broadcast_any_list()
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\_broadcasting.py", line 630, in broadcast_any_list
    outcontent = apply_step(
                 ^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\_broadcasting.py", line 987, in apply_step
    result = action(
             ^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\operations\ak_concatenate.py", line 321, in action
    inner = ak.contents.UnionArray.simplified(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\contents\unionarray.py", line 455, in simplified
    next = contents[0]._carry(index, True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\flori\miniforge3\envs\uni4\Lib\site-packages\awkward\contents\numpyarray.py", line 351, in _carry
    raise ak._errors.index_error(self, carry.data, str(err)) from err
IndexError: cannot slice NumpyArray (of length 130) with array([            0,             1,             2,             3,
                   4,             5,             6,             7,
                   8,             9,            10,            11,
                  12,            13,            14,            15,
                  16,            17,            18,            19,
                  20,            21,            22,            23,
                  24,            25,            26,            27,
                  28,            29,            30,            31,
                  32,            33,            34,            35,
                  36,            37,            38,            39,
                  40,            41,            42,            43,
                  44,            45,            46,            47,
                  48,            49,            50,            51,
                  52,            53,            54,            55,
                  56,            57,            58,            59,
                  60,            61,            62,            63,
                  64,            65,            66,            67,
                  68,            69,            70,            71,
                  72,            73,            74,            75,
                  76,            77,            78,            79,
                  80,            81,            82,            83,
                  84,            85,            86,            87,
                  88,            89,            90,            91,
                  92,            93,            94,            95,
                  96,            97,            98,            99,
                 100,           101,           102,           103,
                 104,           105,           106,           107,
                 108,           109,           110,           111,
                 112,           113,           114,           115,
                 116,           117,           118,           119,
                 120,           121,           122,           123,
                 124,           125,           126,           127,
       1867296635536, 1867296635536], dtype=int64): index 1867296635536 is out of bounds for axis 0 with size 130

This error occurred while calling

    ak.concatenate(
        [<Array [[1]] type='1 * var * int64'>, <Array [[1]] type='1 * var * i...
        axis = -1
    )

@Superharz Superharz added the bug (unverified) The problem described would be a bug, but needs to be triaged label Dec 8, 2023
@agoose77 agoose77 added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Dec 8, 2023
@agoose77
Copy link
Collaborator

agoose77 commented Dec 8, 2023

This is a reasonable thing to expect ak.concatenate to support. Internally we implement concatenation using unions, and unions have a limitation of 128 contents. However, this restriction should not be something that users need to worry about in ak.concatenate unless they end up with >128 distinct content types.

@jpivarski jpivarski added this to Unprioritized in Finalization Jan 19, 2024
@jpivarski jpivarski moved this from Unprioritized to P1 (highest) in Finalization Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
Finalization
P1 (highest)
Development

No branches or pull requests

2 participants