Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to convert generator returning 1D arrays into a 2D array using fromiter/vstack #9525

Open
OyiboRivers opened this issue Apr 5, 2024 · 12 comments
Labels
feature_request question Notes an issue as a question

Comments

@OyiboRivers
Copy link

OyiboRivers commented Apr 5, 2024

Feature Request

I am failing to convert a generator function that returns 1D arrays into a 2D array using numba's fromiter or vstack implementation.

Here is a generator function returning 1D-arrays.

import numpy as np
import numba as nb

@nb.njit
def gen():
    """Generator function returning 1D-arrays."""
    for i in range(3):
        yield np.arange(i, i+3)

The use_gen_fromiter() function raises a TypingError.

@nb.njit
def use_gen_fromiter():
    # return np.fromiter([row for row in gen()], dtype=np.dtype((np.int64, 3)))
    return np.fromiter(gen(), dtype=np.dtype((np.int64, 3)))

print(use_gen_fromiter.py_func())
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]
print(use_gen_fromiter())
# TypingError: Use of unsupported NumPy function 'numpy.fromiter' or unsupported use of the function.

I also tried using np.vstack() to stack the arrays obtained from the generator into a 2D array.
However, this also raises a TypingError.

@nb.njit
def use_gen_vstack():
    return np.vstack(list(gen()))

print(use_gen_vstack.py_func())
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]
print(use_gen_vstack())
# TypingError: No implementation of function Function(<function vstack at 0x7faa6fe66d40>) found for signature:
# vstack(list(array(int64, 1d, C))<iv=None>)

A possible workaround could be converting the generator into a list and fill an array.

@nb.njit
def use_gen_arraylist():
    arraylist = list(gen())
    n = len(arraylist)
    k = arraylist[0].shape[0]
    res = np.empty((n, k), dtype=arraylist[0].dtype)
    for i in range(n):
        res[i] = arraylist[i]
    return res

print(use_gen_arraylist())
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

Can the additional step of converting the generator into a list be avoided?

Python: 3.12.0
Numpy: 1.26.4
Numba: 0.59.1

@gmarkall
Copy link
Member

gmarkall commented Apr 8, 2024

Thanks for the nice report - I think this needs labelling as a feature request to support these functions.

I also tried not using the list in the vstack version:

import numpy as np
import numba as nb

@nb.njit
def gen():
    """Generator function returning 1D-arrays."""
    for i in range(3):
        yield np.arange(i, i+3)

@nb.njit
def use_gen_vstack():
    return np.vstack(gen())

print(use_gen_vstack())

and the error message says:

TypeError: arrays to stack must be passed as a "sequence" type such as list or tuple.

which is a bit odd, given that the list created from the generator isn't accepted, as in your original report.

@OyiboRivers
Copy link
Author

OyiboRivers commented Apr 8, 2024

@gmarkall , thank you very much for your quick response.
The generator function returns an iterable of 1d-numpy arrays.
The error message for "np.vstack(gen())"
TypeError: arrays to stack must be passed as a "sequence" type such as list or tuple.
seems to be OK as per Python's definition of a Sequence.

from collections.abc import Sequence, Iterable
import numpy as np

def gen():
    for i in range(3):
        yield np.arange(i, i+3)

iterable = gen()
isinstance(iterable, Sequence)
# False
isinstance(iterable, Iterable)
# True

The same error appears if you use pure numpy.

import numpy as np

def gen():
    """Generator function returning 1D-arrays."""
    for i in range(3):
        yield np.arange(i, i+3)

np.vstack(gen())
# TypeError: arrays to stack must be passed as a "sequence" type such as list or tuple.

The overloaded Numba function in arrayobj.py expects a "BaseTuple".

@overload(np.vstack)
def impl_np_vstack(tup):
    if isinstance(tup, types.BaseTuple):
        def impl(tup):
            return _np_vstack(tup)
        return impl

As the function does not receive a BaseTuple as argument it seems to return the original numpy error message.

@OyiboRivers
Copy link
Author

It seems that numpy.fromiter is not supported in your current version of Numba.
TypingError: Use of unsupported NumPy function 'numpy.fromiter' or unsupported use of the function.
I have seen the function being used in
#4271
This lead me to the conclusion it might be supported.

@OyiboRivers
Copy link
Author

OyiboRivers commented Apr 8, 2024

Observation: numpy.asarray fails to generate 2D-array from list of 1D-arrays

Converting the generator into a list of 1D-arrays allows numpy.asarray to return a 2D-array.
Utilizing numpy.vstack would not be necessary for generating a 2D array in NumPy.
The numba implementation of numpy.asarray throws an error.

import numpy as np
import numba as nb

@nb.njit
def gen():
    """Generator function returning 1D-arrays."""
    for i in range(3):
        yield np.arange(i, i+3)

@nb.njit
def gen_as_list():
    """Convert generator to list of 1D-arrays."""
    return nb.typed.List(gen())

@nb.njit
def gen_as_array_nb():
    """Convert generator to 2D array."""
    return np.asarray(gen_as_list())

def gen_as_array_np():
    """Convert generator to 2D array."""
    return np.asarray(gen_as_list())

print(gen_as_array_np())
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

print(gen_as_array_nb())
# TypingError: No implementation of function Function(<built-in function asarray>) found for signature:
# asarray(ListType[array(int64, 1d, C)])

If you apply type checking according to @overload(np.asarray) in arraymath.py from line 4291, function type_can_asarray rejects the ListType[array...).

import numpy as np
import numba as nb
from numba import types
from numba.np.numpy_support import type_can_asarray

@nb.njit
def gen():
    """Generator function returning 1D-arrays."""
    for i in range(3):
        yield np.arange(i, i+3)

@nb.njit
def gen_as_list():
    """Convert generator to list of 1D-arrays."""
    return nb.typed.List(gen())

def gen_type_check():
    a = gen_as_list()
    # type checking as per @overload(np.asarray) in arraymath.py from line 4291
    if not type_can_asarray(a):
        return 1
    if isinstance(a, types.Array):
        return 2
    elif isinstance(a, (types.Sequence, types.Tuple)):
        return 3
    elif isinstance(a, (types.Number, types.Boolean)):
        return 4
    elif isinstance(a, types.containers.ListType):
        return 5
    elif isinstance(a, types.StringLiteral):
        return 6
    else:
        return 7

print(gen_type_check())
# 1

# def type_can_asarray(arr):
#     ok = (types.Array, types.Sequence, types.Tuple, types.StringLiteral,
#           types.Number, types.Boolean, types.containers.ListType)
#     return isinstance(arr, ok)

print(type_can_asarray(gen_as_list()))
# False

print(isinstance(gen_as_list(), types.containers.ListType))
# False

This behavior seems to be related to:
#6803

@OyiboRivers
Copy link
Author

Unfortunately, the implementation of numpy.fromiter seems to be problematic.

  1. The output array shape and type must be determined by a function argument "dtype" not by an argument type.
@nb.njit
def np_fromiter_impl(iter, dtype):
    arraylist = nb.typed.List(iter)
    size = len(arraylist)
    out = np.empty(size, dtype=dtype)
    for i in range(size):
        out[i] = arraylist[i]
    return out

dtype = np.dtype(('<i8', (3,)))

print(np_fromiter_impl.py_func(gen(), dtype=dtype))
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

print(np_fromiter_impl(gen(), dtype=dtype))
# TypingError: non-precise type pyobject
# During: typing of argument at /tmp/ipykernel_1004317/1459046023.py (1)
  1. You can't specify advanced array types which would be necessary to describe the shape and type of the output array.
@nb.njit
def make_dtype():
    return np.dtype(('<i8', (3,)))

print(make_dtype.py_func())
# ('<i8', (3,))

print(make_dtype())
# TypingError: No implementation of function Function(<class 'numpy.dtype'>) found for signature:
# dtype(Tuple(Literal[str](<i8), UniTuple(int64 x 1)))

@gmarkall should I open a separate issue to support advanced data types in the numpy.dtype implementation?

@gmarkall
Copy link
Member

gmarkall commented Apr 9, 2024

The error message for "np.vstack(gen())"
TypeError: arrays to stack must be passed as a "sequence" type such as list or tuple.
seems to be OK as per Python's definition of a Sequence.

The part of the message I thought was a problem was the implication that a list must be passed - your second example in the issue report appears to be passing a list but that still doesn't work.

I have seen the function being used in #4271
This lead me to the conclusion it might be supported.

I think if any of the snippets in that issue did run at some point, they might have fallen back to object mode - I can't get any of the code in that issue to run at present. A variation on those snippets I can get to run is:

from numba import jit
import numpy as np


@jit(forceobj=True)
def func(x):
    return x[0]


@jit(forceobj=True)
def error():
    topo = [chr(ord('a') + i) for i in range(5)]
    types = [(var_name, "int") for var_name in topo]
    sampled = np.zeros(10, dtype=types)
    return np.fromiter([func(x) for x in sampled], dtype=int)


print(error())

but you don't really want to be running in object mode, since its only effect is really to remove a tiny bit of interpreter loop overhead within your function.

@gmarkall
Copy link
Member

gmarkall commented Apr 9, 2024

@gmarkall should I open a separate issue to support advanced data types in the numpy.dtype implementation?

I think that seems like a good thing to do, to keep the discussions simpler to follow in issues - many thanks!

@OyiboRivers
Copy link
Author

OyiboRivers commented Apr 9, 2024

@gmarkall no problem, I will open another issue for advanced use of numpy.dtype as feature request #9527

What about the issue in np.asarray where a typed list is not identified as a ListType?

print(isinstance(gen_as_list(), types.containers.ListType))
# False

This seems weird to me. Should I also open an issue on this matter or is this the expected behavior?

@gmarkall
Copy link
Member

gmarkall commented Apr 9, 2024

What you observe above is expected. The types.containers.ListType is a Numba type used for Numba's type system. The object you get returned when creating a typed list is a numba.typed.List:

from numba import njit, typed

@njit
def give_a_list():
    return typed.List([1, 2, 3])

print(isinstance(give_a_list(), typed.List))

prints

$ python listtype.py 
True

@OyiboRivers
Copy link
Author

@gmarkall thank you.

The implementation of numpy.asarray converts a typed list of scalars into a numpy array.
The function fails converting a typed list of 1D-numpy arrays into a 2D-numpy array.
This feature does not seem to be supported. Is that correct?

import numpy as np
from numba import njit, typed

@njit
def give_list():
    return typed.List([1, 2, 3])

@njit
def give_arraylist():
    return typed.List([np.array([1, 2, 3]), np.array([1, 2, 3])])

@njit
def asarray(a):
    return np.asarray(a)

print(asarray(give_list()))
# [1 2 3]

print(asarray(give_arraylist()))
# TypingError: No implementation of function Function(<built-in function asarray>) found for signature:
# asarray(ListType[array(int64, 1d, C)])

@gmarkall
Copy link
Member

gmarkall commented Apr 9, 2024

This feature does not seem to be supported. Is that correct?

I think that is correct. The docs are not very clear about what it might accept though: https://numba.readthedocs.io/en/stable/reference/numpysupported.html#other-functions

@OyiboRivers
Copy link
Author

OyiboRivers commented Apr 9, 2024

@gmarkall thank you.
If asarray would be able to convert a typed list of 1D-arrays into a 2D-array that could theoretically make room to implement np.fromiter. Although I'm not sure about the performance and safety of this operation since you can't influence the dtype for typed.List(iterable). You may have to cast the final array to the desired dtype.
np.fromiter => np.asarray(typed.List(iterable), dtype)

Should I open an issue to support conversion of a typed list of 1D-arrays into a 2D-array using asarray?

Edit:
You can already implement np.fromiter for scalars using reflected lists in combination with asarray.

import numpy as np
from numba import njit, types
from numba.extending import overload
from numba.core.errors import TypingError

@overload(np.fromiter)
def ovl_fromiter(iter, dtype):
    def np_fromiter_impl(iter, dtype):
        return np.asarray(list(iter), dtype)
    # Type check
    if not isinstance(iter, types.IterableType):
        raise TypingError("First argument must be an iterable.")
    else:
        return np_fromiter_impl

@njit
def gen_scalars():
    """Generator function returning scalars."""
    for i in range(3):
        yield i

@njit
def use_fromiter_scalars(dtype):
    return np.fromiter(gen_scalars(), dtype)

print(use_fromiter_scalars(np.int64))
# [0 1 2]

Unfortunately, this method sometimes works and sometimes fails using typed lists.

@overload(np.fromiter)
def ovl_fromiter(iter, dtype):
    def np_fromiter_impl(iter, dtype):
        return np.asarray(typed.List(iter), dtype)
    # Type check
    if not isinstance(iter, types.IterableType):
        raise TypingError("First argument must be an iterable.")
    else:
        return np_fromiter_impl

Edit 2:
I can't reproduce the error I received earlier using typed lists. Now both methods work on my machine. Not sure what the problem was.
Both methods fail on generators returning 1D-arrays.

Edit 3:
I've just realized that np.asarray surprisingly works on iterables as scalars without the intermediate step of converting to a list. Numpy's asarray does not return an array. There is a deviation in behavior.

import numpy as np
from numba import njit, types
from numba.extending import overload
from numba.core.errors import TypingError

@overload(np.fromiter)
def ovl_fromiter(iter, dtype):
    def np_fromiter_impl(iter, dtype):
        return np.asarray(iter, dtype)
    # Type check
    if not isinstance(iter, types.IterableType):
        raise TypingError("First argument must be an iterable.")
    else:
        return np_fromiter_impl

@njit
def gen():
    """Generator function returning 1D-arrays."""
    for i in range(3):
        yield i

@njit
def use_fromiter(dtype):
    return np.fromiter(gen(), dtype)

print(use_fromiter(np.int64))
# [0 1 2]

print(np.asarray(gen.py_func()))
# <generator object gen at 0x7f7782e64880>

@gmarkall gmarkall added the question Notes an issue as a question label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature_request question Notes an issue as a question
Projects
None yet
Development

No branches or pull requests

2 participants