Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] corrwith result incorrect #3247

Open
fyrestone opened this issue Sep 8, 2022 · 0 comments
Open

[BUG] corrwith result incorrect #3247

fyrestone opened this issue Sep 8, 2022 · 0 comments
Labels
type: bug Something isn't working

Comments

@fyrestone
Copy link
Contributor

fyrestone commented Sep 8, 2022

Describe the bug
A clear and concise description of what the bug is.

Run test_dataframe_corr_with with MARS_CI_BACKEND=ray environment:

mars/dataframe/statistics/tests/test_statistics_execution.py:166 (test_dataframe_corr_with)
setup = <mars.deploy.oscar.session.SyncSession object at 0x1572b63a0>

    def test_dataframe_corr_with(setup):
        rs = np.random.RandomState(0)
        raw_df = rs.rand(20, 10)
        raw_df = pd.DataFrame(
            np.where(raw_df > 0.4, raw_df, np.nan), columns=list("ABCDEFGHIJ")
        )
        raw_df2 = rs.rand(20, 10)
        raw_df2 = pd.DataFrame(
            np.where(raw_df2 > 0.4, raw_df2, np.nan), columns=list("ACDEGHIJKL")
        )
        raw_s = rs.rand(20)
        raw_s = pd.Series(np.where(raw_s > 0.4, raw_s, np.nan))
        raw_s2 = rs.rand(10)
        raw_s2 = pd.Series(np.where(raw_s2 > 0.4, raw_s2, np.nan), index=raw_df2.columns)
    
        df = DataFrame(raw_df)
        df2 = DataFrame(raw_df2)
    
        result = df.corrwith(df2)
        pd.testing.assert_series_equal(result.execute().fetch(), raw_df.corrwith(raw_df2))
    
        result = df.corrwith(df2, axis=1)
        pd.testing.assert_series_equal(
            result.execute().fetch(), raw_df.corrwith(raw_df2, axis=1)
        )
    
        result = df.corrwith(df2, method="kendall")
        pd.testing.assert_series_equal(
            result.execute().fetch(), raw_df.corrwith(raw_df2, method="kendall")
        )
    
        df = DataFrame(raw_df, chunk_size=4)
        df2 = DataFrame(raw_df2, chunk_size=6)
        s = Series(raw_s, chunk_size=5)
        s2 = Series(raw_s2, chunk_size=5)
    
        with pytest.raises(Exception):
            df.corrwith(df2, method="kendall").execute()
    
        result = df.corrwith(df2)
>       pd.testing.assert_series_equal(
            result.execute().fetch().sort_index(), raw_df.corrwith(raw_df2).sort_index()
        )

mars/dataframe/statistics/tests/test_statistics_execution.py:207: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/_libs/testing.pyx:53: in pandas._libs.testing.assert_almost_equal
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   AssertionError: Series are different
E   
E   Series values are different (66.66667 %)
E   [index]: [A, B, C, D, E, F, G, H, I, J, K, L]
E   [left]:  [-0.999999999999993, nan, -0.7101574548120652, 0.9999999999999998, -0.9999999999999999, nan, nan, 0.4734592586091424, -0.0859298220144991, nan, nan, nan]
E   [right]: [0.2138665453062885, nan, 0.49514640730296355, 0.12552323646958655, -0.6144321615177657, nan, -0.11419110881167645, 0.4737487368091909, -0.23766427699623854, -0.33167205059238336, nan, nan]

pandas/_libs/testing.pyx:168: AssertionError

but, if the chunk_size argument is removed, the case will be passed. The bug may be in rechunk/align operations.

It's wired that if I copied the code from UT, run the following code with mars backend directly, some errors will be occrred.

import numpy as np
import pandas as pd
import mars.dataframe as md
from mars.deploy.oscar.tests.session import new_test_session

sess = new_test_session()
sess.as_default()

rs = np.random.RandomState(0)
raw_df = rs.rand(20, 10)
raw_df = pd.DataFrame(
    np.where(raw_df > 0.4, raw_df, np.nan), columns=list("ABCDEFGHIJ")
)
raw_df2 = rs.rand(20, 10)
raw_df2 = pd.DataFrame(
    np.where(raw_df2 > 0.4, raw_df2, np.nan), columns=list("ACDEGHIJKL")
)

df = md.DataFrame(raw_df)  # no chunk_size output 1, chunk_size=4 output 2
df2 = md.DataFrame(raw_df2, chunk_size=6)
result = df.corrwith(df2)
pd.testing.assert_series_equal(
    result.execute().fetch().sort_index(), raw_df.corrwith(raw_df2).sort_index()
)

output 1 (df1 no chunk_size)

  File "/Users/po.lb/Work/mars/mars/services/subtask/worker/processor.py", line 473, in run
    await self._execute_graph(chunk_graph)
  File "/Users/po.lb/Work/mars/mars/services/subtask/worker/processor.py", line 237, in _execute_graph
    await to_wait
  File "/Users/po.lb/Work/mars/mars/lib/aio/_threads.py", line 36, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/Users/po.lb/.pyenv/versions/3.8.13/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/po.lb/Work/mars/mars/services/subtask/worker/tests/subtask_processor.py", line 84, in _execute_operand
    self.assert_object_consistent(out, ctx[out.key])
  File "/Users/po.lb/Work/mars/mars/tests/core.py", line 493, in assert_object_consistent
    self.assert_dataframe_consistent(expected, real)
  File "/Users/po.lb/Work/mars/mars/tests/core.py", line 355, in assert_dataframe_consistent
    self.assert_shape_consistent(expected.shape, real.shape)
  File "/Users/po.lb/Work/mars/mars/tests/core.py", line 274, in assert_shape_consistent
    raise AssertionError(
AssertionError: [address=127.0.0.1:62877, pid=4188] shape in metadata (6, 6) is not consistent with real shape (6, 8)

output 2 (df chunk_size=4)

Traceback (most recent call last):
  File "/Users/po.lb/Work/mars/t3.py", line 22, in <module>
    pd.testing.assert_series_equal(
  File "/Users/po.lb/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/_testing/asserters.py", line 1077, in assert_series_equal
    _testing.assert_almost_equal(
  File "pandas/_libs/testing.pyx", line 53, in pandas._libs.testing.assert_almost_equal
  File "pandas/_libs/testing.pyx", line 168, in pandas._libs.testing.assert_almost_equal
  File "/Users/po.lb/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Series are different

Series values are different (66.66667 %)
[index]: [A, B, C, D, E, F, G, H, I, J, K, L]
[left]:  [-0.999999999999993, nan, -0.7101574548120652, 0.9999999999999998, -0.9999999999999999, nan, nan, 0.4734592586091424, -0.0859298220144991, nan, nan, nan]
[right]: [0.2138665453062885, nan, 0.49514640730296355, 0.12552323646958655, -0.6144321615177657, nan, -0.11419110881167645, 0.4737487368091909, -0.23766427699623854, -0.33167205059238336, nan, nan]

The output 2 of Mars backend is same with the output of CI with Ray backend.

To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version 3.8.13
  2. The version of Mars you use Latest master
  3. Versions of crucial packages, such as numpy, scipy and pandas pandas==1.3.5
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@fyrestone fyrestone added the type: bug Something isn't working label Sep 8, 2022
@fyrestone fyrestone self-assigned this Sep 8, 2022
@fyrestone fyrestone changed the title [BUG] Ray executor corrwith result incorrect [BUG] Ray executor corrwith result is incorrect Sep 8, 2022
@fyrestone fyrestone changed the title [BUG] Ray executor corrwith result is incorrect [BUG] Ray executor corrwith result incorrect Sep 8, 2022
@fyrestone fyrestone removed their assignment Sep 9, 2022
@fyrestone fyrestone changed the title [BUG] Ray executor corrwith result incorrect [BUG] corrwith result incorrect Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant