Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure map_partitions returns Series object if function returns scalar #10739

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Dec 22, 2023

@fjetter fjetter requested a review from phofl December 22, 2023 16:00
@fjetter fjetter changed the title Ensure map_partitions returns Series object if function returns scalar Ensure map_partitions returns Series object if function returns scalar Dec 22, 2023
Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@fjetter
Copy link
Member Author

fjetter commented Dec 22, 2023

Looks like there are a bunch of genuine typing issues since I set the dtype on the created series explicitly. @phofl if you want to, you can pick this up after the holidays but if not I'll handle it once I'm back

@phofl
Copy link
Collaborator

phofl commented Dec 22, 2023

I'll take a look after the holidays, the dtype issue is most likely because of running this on an empty meta object, that's the situation where I've seen us ending up with NaN there

Copy link
Member Author

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl in case you want to have another look

@@ -7282,6 +7287,10 @@ def apply_and_enforce(*args, **kwargs):
func = kwargs.pop("_func")
meta = kwargs.pop("_meta")
df = func(*args, **kwargs)

if is_scalar(df) and is_series_like(meta):
df = type(meta)([df])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing the explicit dtype since it caused issues and was quite unnecessary considering that df is already the final result data. The casting/coercing logic of the Series takes care of compat stuff with nulls, etc.

token=name1,
meta=self,
**chunk_kwargs,
enforce_metadata=False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using map_partitions internally we sometimes rely on the data not being cast to a Series. The cumulative aggregation logic becomes quite cumbersome otherwise. I hope this is not introducing any other weird artefacts 🤞

@fjetter fjetter force-pushed the ensure_map_partitions_returns_series branch from b21d4f2 to 71c29e0 Compare January 10, 2024 13:41
Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic looks fine, I think the test failures are something that we can fix by adjusting the test

@fjetter fjetter force-pushed the ensure_map_partitions_returns_series branch from 7d4fbd7 to fc5cd91 Compare January 12, 2024 13:36
Copy link
Contributor

github-actions bot commented Jan 12, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ± 0       15 suites  ±0   3h 35m 57s ⏱️ + 13m 1s
 12 934 tests + 1   11 938 ✅ +11     922 💤 ±0   73 ❌  - 10   1 🔥 ±0 
159 683 runs  +12  143 003 ✅ +20  16 508 💤 +2  132 ❌  - 10  40 🔥 ±0 

For more details on these failures and errors, see this check.

Results for commit c60269b. ± Comparison against base commit 1ecf464.

♻️ This comment has been updated with latest results.

@fjetter fjetter force-pushed the ensure_map_partitions_returns_series branch from fc5cd91 to a7f08aa Compare January 15, 2024 08:52
@fjetter fjetter force-pushed the ensure_map_partitions_returns_series branch from 2e95b60 to c60269b Compare January 29, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants