Skip to content

[SPARK-52771][PS] Fix float32 type widening in truediv/floordiv #51456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Jul 11, 2025

What changes were proposed in this pull request?

Fix float32 type widening in truediv/floordiv when ANSI on/off.

Why are the changes needed?

Ensure pandas on Spark works well with ANSI mode on/off.

Note that the issue exists whether ANSI mode is on or off, as shown below,

>>> pser = pd.Series([1.1, 2.2, 3.3], dtype=np.float32)
>>> psser = ps.from_pandas(pser)
>>> spark.conf.set("spark.sql.ansi.enabled", False)
>>> psser / 1
0    1.1                                                                        
1    2.2
2    3.3
dtype: float64
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> psser / 1
0    1.1
1    2.2
2    3.3
dtype: float64

Does this PR introduce any user-facing change?

Yes. truediv/floordiv under ANSI works as expected, with ANSI on/off, as shown below.

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> ps.set_option("compute.fail_on_ansi_mode", False)
>>> ps.set_option("compute.ansi_mode_support", True)
>>> 
>>> pser = pd.Series([1.1, 2.2, 3.3], dtype=np.float32)
>>> psser = ps.from_pandas(pser)
>>> psser / 1
0    1.1                                                                        
1    2.2
2    3.3
dtype: float32
>>> psser // 1
0    1.0
1    2.0
2    3.0
dtype: float32

How was this patch tested?

Unit tests.

Commands below all passed

 1103  SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_truediv"
 1104  SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_truediv"
 1106  SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_floordiv"
 1108  SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_floordiv"
 1126  git status
 1127  SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_divide_by_zero_behavior"
 1128  SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_divide_by_zero_behavior"

Was this patch authored or co-authored using generative AI tooling?

No.

@xinrong-meng xinrong-meng changed the title [SPARK-52771][PS] Fix float32 type widening in truediv/floordiv under ANSI [SPARK-52771][PS] Fix float32 type widening in truediv/floordiv Jul 11, 2025
@xinrong-meng
Copy link
Member Author

@ueshin may I get a review please?

@HyukjinKwon
Copy link
Member

Merged to master.

haoyangeng-db pushed a commit to haoyangeng-db/apache-spark that referenced this pull request Jul 22, 2025
### What changes were proposed in this pull request?
Fix float32 type widening in `truediv`/`floordiv` when ANSI on/off.

### Why are the changes needed?
Ensure pandas on Spark works well with ANSI mode on/off.

Note that the issue exists whether ANSI mode is on or off, as shown below,

```py
>>> pser = pd.Series([1.1, 2.2, 3.3], dtype=np.float32)
>>> psser = ps.from_pandas(pser)
>>> spark.conf.set("spark.sql.ansi.enabled", False)
>>> psser / 1
0    1.1
1    2.2
2    3.3
dtype: float64
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> psser / 1
0    1.1
1    2.2
2    3.3
dtype: float64
```

### Does this PR introduce _any_ user-facing change?
Yes. truediv/floordiv under ANSI works as expected, with ANSI on/off, as shown below.

```py
>>> import pandas as pd
>>> import numpy as np
>>>
>>> ps.set_option("compute.fail_on_ansi_mode", False)
>>> ps.set_option("compute.ansi_mode_support", True)
>>>
>>> pser = pd.Series([1.1, 2.2, 3.3], dtype=np.float32)
>>> psser = ps.from_pandas(pser)
>>> psser / 1
0    1.1
1    2.2
2    3.3
dtype: float32
>>> psser // 1
0    1.0
1    2.0
2    3.0
dtype: float32
```

### How was this patch tested?
Unit tests.

Commands below all passed
```
 1103  SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_truediv"
 1104  SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_truediv"
 1106  SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_floordiv"
 1108  SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_binary_operator_floordiv"
 1126  git status
 1127  SPARK_ANSI_SQL_MODE=true  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_divide_by_zero_behavior"
 1128  SPARK_ANSI_SQL_MODE=false  ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.computation.test_binary_ops FrameBinaryOpsTests.test_divide_by_zero_behavior"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#51456 from xinrong-meng/div.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants