Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: postgres.to_pyarrow(ibis.uuid()) errors #8902

Open
1 task done
NickCrews opened this issue Apr 5, 2024 · 0 comments · May be fixed by #8901
Open
1 task done

bug: postgres.to_pyarrow(ibis.uuid()) errors #8902

NickCrews opened this issue Apr 5, 2024 · 0 comments · May be fixed by #8901
Assignees
Labels
bug Incorrect behavior inside of ibis

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Apr 5, 2024

What happened?

See the xfailing test in #8901

ibis/backends/__init__.py:221: in to_pyarrow
    table = pa.Table.from_batches(reader, schema=arrow_schema)
pyarrow/table.pxi:4104: in pyarrow.lib.Table.from_batches
    ???
pyarrow/ipc.pxi:666: in pyarrow.lib.RecordBatchReader.__next__
    ???
pyarrow/ipc.pxi:700: in pyarrow.lib.RecordBatchReader.read_next_batch
    ???
pyarrow/error.pxi:88: in pyarrow.lib.check_status
    ???
ibis/backends/sql/__init__.py:375: in <genexpr>
    pa.array(map(tuple, batch), type=array_type)
pyarrow/array.pxi:343: in pyarrow.lib.array
    ???
pyarrow/array.pxi:42: in pyarrow.lib._sequence_to_array
    ???
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowTypeError: Expected bytes, got a 'UUID' object

pyarrow/error.pxi:91: ArrowTypeError

in ibis/formats/pyarrow.py, we say that ibis.UUID should map to pa.string. This makes sense, there is no builting UUID in pyarrow.

but then in SQLBackend.to_pyarrow_batches(), we do

schema = expr.as_table().schema()
        array_type = schema.as_struct().to_pyarrow()
        arrays = (
            pa.array(map(tuple, batch), type=array_type)
            for batch in self._cursor_batches(
                expr, params=params, limit=limit, chunk_size=chunk_size
            )
        )

here, batch is tuples of the python standard lib uuid.UUID from the psyopg2 cursor. Then, the pa.array(<uuid.UUIDs> type=pa.string) call fails.

I see a few possible paths here:

  • do the conversion on the backend side, before fetching results. Then the psyopg2 cursor returns actual strings. We already have the ibis->pyarrow type mappings, we could then roundtrip the other way first, eg expr.cast(type.to_arrow().to_ibis()) Probably the easiest, but might lead to the wrong types sometimes if some info is lost during this roundtrip?
  • do the conversion after fetching from the cursor. maybe hardcode the few datatypes that are problems?
  • submit a patch to upstream pyarrow to accept uuid.UUID types

What version of ibis are you using?

main

What backend(s) are you using, if any?

postgres

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the bug Incorrect behavior inside of ibis label Apr 5, 2024
@NickCrews NickCrews linked a pull request Apr 11, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

2 participants