Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent query behavior when handling future data #523

Open
localvar opened this issue Mar 6, 2024 · 3 comments
Open

inconsistent query behavior when handling future data #523

localvar opened this issue Mar 6, 2024 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@localvar
Copy link
Contributor

localvar commented Mar 6, 2024

Describe your question

I generate 100M records into openGemini, the timestamp range is from 2023 to 2026.

when I execute select count(x) and select count(x) group by time(30d), I found the former takes future data into account, while the latter has not.

I cannot say which one is correct, but they are inconsistent.

IMHO, if openGemini allows user to insert future data then these data should be always counted, and I consider this is a feature.

image

@localvar localvar added the help wanted Extra attention is needed label Mar 6, 2024
@xiangyu5632
Copy link
Member

What is your scenario for writing future data? Generally, this problem does not occur in O&M monitoring and IoT scenarios.
Considering the write performance, if we check each data record whether is greater than the current time, the performance is greatly affected.

@localvar
Copy link
Contributor Author

localvar commented Mar 6, 2024

What is your scenario for writing future data? Generally, this problem does not occur in O&M monitoring and IoT scenarios. Considering the write performance, if we check each data record whether is greater than the current time, the performance is greatly affected.

I have seen some corner cases, such as server time be adjusted to one year before intentionly. But this is not my point, my point is: the behavior should be consistent, if we allow future data, then we should always count them in.

@xiangyu5632
Copy link
Member

I agree with you, but whether this changes the original semantics of the operator needs some more discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants