Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: ddof Argument in pl.corr is Redundant #16161

Open
fleibfried opened this issue May 10, 2024 · 0 comments
Open

Issue: ddof Argument in pl.corr is Redundant #16161

fleibfried opened this issue May 10, 2024 · 0 comments

Comments

@fleibfried
Copy link

fleibfried commented May 10, 2024

Description
The Polars correlation function pl.corr appears to accept a ddof argument to specify the degrees of freedom when calculating the correlation coefficient. However, the correlation coefficient should be invariant under different ddof values since it's a ratio of covariance and standard deviation times standard deviation, both of which include the same scaling factor. Therefore, the ddof argument should not be required (see also numpy.corrcoef where bias and ddof are deprecated).

Example to Reproduce

import polars as pl

# Create a sample DataFrame
df = pl.DataFrame({
    "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "y": [2.1, 2.5, 2.9, 3.6, 3.8, 4.5, 5.1, 5.3, 5.8, 6.3]
})

# Compute correlation with different ddof values
corr_ddof_0 = df.select(pl.corr("x", "y", ddof=0)).item()
corr_ddof_1 = df.select(pl.corr("x", "y", ddof=1)).item()

# Output the correlation values
print(f"Correlation between 'x' and 'y' with ddof=0: {corr_ddof_0}")
print(f"Correlation between 'x' and 'y' with ddof=1: {corr_ddof_1}")

Expected Behavior
Both correlation coefficients should yield the same value because the ddof argument should not affect the correlation result.

Expected Output:

Correlation between 'x' and 'y' with ddof=0: 0.9971627582526871
Correlation between 'x' and 'y' with ddof=1: 0.997162758252687

Suggested Changes

  • Remove the ddof argument from the pl.corr function.
  • If the ddof argument is retained for backward compatibility, consider adding documentation notes about its redundancy in correlation computation.

Environment

  • OS: Windows 10
  • Python Version: 3.10.13
  • Polars Version: 0.20.22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant