Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomalies in the passiv data #141

Open
peterdudfield opened this issue Jan 12, 2023 · 6 comments
Open

Anomalies in the passiv data #141

peterdudfield opened this issue Jan 12, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@peterdudfield
Copy link
Contributor

Plot of max valye per day
Screenshot 2023-01-12 at 08 05 29

suggestions:

  • capacity could defined 99% of data, remove anaything above
  • All could remove 5stds away from the mean

Note any removal of data will cause gaps in the data

@peterdudfield peterdudfield added the enhancement New feature or request label Jan 12, 2023
@jacobbieker
Copy link
Member

What about taking the mean value of the points around the anomaly and replacing it with that? Would at least keep the gaps filled

@peterdudfield
Copy link
Contributor Author

What about taking the mean value of the points around the anomaly and replacing it with that? Would at least keep the gaps filled

yea, or just linear interpolate. Depends how big the anomly is. If its > 12 hours, then probably just remove the data. If its one point, i agree, can just be filled in. But data pipes would be able to handle the gap in data.

Whenever I get a chance, Ill try to point out the advantages to ocf datapipes

@peterdudfield
Copy link
Contributor Author

@simlmx reference here

@jacobbieker
Copy link
Member

Yeah, currently for training if there is a gap, it'll mean the valid time period selection will say there is an issue, but yeah, I agree if its for awhile, then just remove it all.

@simlmx
Copy link

simlmx commented Jan 12, 2023

Making sure to use absolute loss |y - pred| instead of squared loss (y-pred)**2 when training my model fixed all the issues with outliers. When using absolute loss, removing the problematic values didn't help in a significant way.

@dfulu
Copy link
Member

dfulu commented Oct 27, 2023

This might be partially solved now. Using a percentile for the estimated capacity was added in #225. the mentioned filtering isn't included yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants