Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Mars build graph took too much time #3174

Open
chaokunyang opened this issue Jun 29, 2022 · 0 comments
Open

[BUG] Mars build graph took too much time #3174

chaokunyang opened this issue Jun 29, 2022 · 0 comments

Comments

@chaokunyang
Copy link
Contributor

chaokunyang commented Jun 29, 2022

Describe the bug
When executing a blockwise operations in mars which have many setitem/getitem nodes, mars will take about 1 minutes, which is too long.

To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.7.9

  2. The version of Mars you use: master

  3. Versions of crucial packages, such as numpy, scipy and pandas

  4. Full stack of the error.
    image

  5. Minimized code to reproduce the error.

import math
df = md.DataFrame(
    mt.random.rand(120_0000, 70, chunk_size=5000),
    columns=[f"col{i}" for i in range(70)])

for c in range(70):
    df[f"col{i+70}"] = df[f"col{i}"].fillna(0)
    df[f"col{i+140}"] = df[f"col{i}"].fillna(0)
for c in range(70):
    df[f"col{i}"] = df[f"col{i}"]/100
df=df.fillna(0)
cols=df.columns.to_pandas().values
df=df[cols[:-1]]
df=df.apply(lambda x: x, axis=1)
df = df.replace('NaN', np.nan)  # replace string NaN to numpy
df = df.replace(math.nan, np.nan)  # replace string NaN to numpy
df = df.fillna(value=np.nan)  # replace none, null to numpy
df.map_chunk(lambda x:x).execute()

Expected behavior
The graph building time should be less than 3 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant