Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values not plotted in right order for large dataframes #46

Open
rik-vandervlist-sympower opened this issue Mar 14, 2023 · 0 comments
Open

Comments

@rik-vandervlist-sympower

Environment

  • Operating System: Linux (databricks runtime 11.3 ML)
  • Python Version: 3.9.5
  • How did you install bamboolib: pip
  • Python packages: See Databricks runtime link

Description of Issue

  • What did you expect to happen?
    To see a timeseries line graph of my data
  • What happened instead?
    Lines of the graph go everywhere, since the plotted data is not properly sorted by timestamp

Reproduction Steps

  1. Find a large dataset
  2. Make a plot with timestamp column vs. a variable of interest
  3. Graph shows up mangled.

The generated code looks as follows:

import plotly.express as px
fig = px.line(power_data.sort_values(by=['utc_timestamp'], ascending=[True]).sample(n=10000, replace=False, random_state=123).sort_index(), x='utc_timestamp', y='value', color='id')
fig

Note here that .sort_values(by=['utc_timestamp'], ascending=[True]) is placed before .sample(n=10000, replace=False, random_state=123). There is a .sort_index(), but the time column is not the index of my pandas dataframe. I think placing the sort_values after the random sample might resolve this issue

What steps have you taken to resolve this already?

See the above!

Anything else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant