Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyarrow engine incorretly serialize timestamp with Z. #2384

Open
thomasfrederikhoeck opened this issue Apr 4, 2024 · 0 comments
Open

Pyarrow engine incorretly serialize timestamp with Z. #2384

thomasfrederikhoeck opened this issue Apr 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@thomasfrederikhoeck
Copy link
Contributor

thomasfrederikhoeck commented Apr 4, 2024

Environment

Delta-rs version: main

Binding: python

Environment:

  • Cloud provider:
  • OS: windows
  • Other:

Bug

What happened:
Pyarrow serialize timestamp with Z in the end incorrectly which is in contrast to timestampNtz which is correct without Z.

image

What you expected to happen:
Both without Z
How to reproduce it:

import pyarrow as pa
import pytz

tz = "UTC"

def get_data(with_tz):
    tzinfo = pytz.timezone(tz) if  with_tz else None
    dates = pd.date_range(
        datetime(2021,1,1,3,4,6,3, tzinfo=tzinfo),
        datetime(2021,1,3,3,4,6, tzinfo=tzinfo)
        )
    return pd.DataFrame({"time":dates, "a":[i for i in range(len(dates))]})

schema = pa.schema(
        [
            ("time", pa.timestamp("us")),
            ("a", pa.int64()),
        ]
    )
dt = DeltaTable.create(
        "mytable_timestampNtz", schema=schema, partition_by=["time"]
    )

write_deltalake("mytable_timestampNtz",get_data(with_tz=False), partition_by="time", mode="append")
print(dt.schema())
schema = pa.schema(
        [
            ("time", pa.timestamp("us",tz)),
            ("a", pa.int64()),
        ]
    )
dt = DeltaTable.create(
        "mytable_timestamp", schema=schema, partition_by=["time"]
    )

write_deltalake("mytable_timestamp",get_data(with_tz=True), partition_by="time", mode="append")
print(dt.schema())

>Schema([Field(time, PrimitiveType("timestampNtz"), nullable=True), Field(a, PrimitiveType("long"), nullable=True)])
>Schema([Field(time, PrimitiveType("timestamp"), nullable=True), Field(a, PrimitiveType("long"), nullable=True)])

More details:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant