Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Performance using Python #664

Open
raeudigerRaeffi opened this issue May 12, 2024 · 1 comment
Open

Bad Performance using Python #664

raeudigerRaeffi opened this issue May 12, 2024 · 1 comment

Comments

@raeudigerRaeffi
Copy link

Hi we are using a self hosted version of piston and we encountered some major limitations with regards to runtime for Python. Given that Piston advertise itself as efficent and fast I assume the issue is with us and not with the software.
Our setup is the following:
We use the piston docker image with the cli to install python. Then we run sudo /piston/packages/python/3.12.0//bin/pip3 install statsmodels plotly plotly-express scikit-learn in order to install custom libaries.
The following enviroment variables are set:

  • PISTON_RUN_TIMEOUT=80000
  • PISTON_STDERR_LENGTH=800000
  • PISTON_MAX_PROCESS_COUNT=124
  • PISTON_MAX_FILE_SIZE=100000
  • PISTON_OUTPUT_MAX_SIZE=250000

Using this setup the code displayed below takes around 20 secs to execute for 50 data point in os.environ["data"] (On my machine it takes less than a second).

import os
import json
import pandas as pd
import plotly
import numpy as np
import plotly.express as px
data = json.loads(os.environ["data"])
df = pd.DataFrame(data)
df['order_date'] = pd.to_datetime(df['order_date'], format='%d/%m/%Y %H:%M')
fig = px.scatter(df, x='order_date', y='sales', trendline='ols')
graph_json = plotly.io.to_json(fig)\nprint({\"type\":\"plot\",\"variable\":graph_json})
@HexF
Copy link
Collaborator

HexF commented May 13, 2024

Are you running Piston on the same system as your local test? This could be one factor for the slow performance. This shouldn't have too large of an impact though.

I'm thinking this might be to do with python not caching pyc files for these libraries.
This is by design to ensure complete isolation of code with no persistent files across runs.

I would try seeing which lines of code are causing the performance bottleneck. My bets would be on one of the imports

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants