Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify documentation of the timestamp offset for streams #1598

Open
Vic152 opened this issue Feb 2, 2023 · 8 comments
Open

Clarify documentation of the timestamp offset for streams #1598

Vic152 opened this issue Feb 2, 2023 · 8 comments

Comments

@Vic152
Copy link

Vic152 commented Feb 2, 2023

Hey Guys,

We are using streams in RabbitMQ. It was very confusing when we try to attach to a stream at midnight. It turns out we actually attach to a nearest chunk that was earlier. I propose that you make these changes to documentation:

https://www.rabbitmq.com/streams.html

Where it reads:
Timestamp - a timestamp value specifying the point in time to attach to the log at. It will clamp to the closest offset, if the timestamp is out of range for the stream it will clamp either the start or end of the log respectively. With AMQP 0.9.1, the timestamp used is POSIX time with an accuracy of one second, that is the number of seconds since 00:00:00 UTC, 1970-01-01. Be aware consumers can receive messages published a bit before the specified timestamp.

Replace with:

Timestamp - a timestamp value specifying the point in time to attach to the log at. It will clamp to the closest offset, if the timestamp is out of range for the stream it will clamp either the start or end of the log respectively. The chunk size is dynamic and based on the message ingress rate. Be aware consumers can receive messages published a bit before the specified timestamp. Applications consuming RabbitMq streams should apply filter to ingest messages of interest. With AMQP 0.9.1, the timestamp used is POSIX time with an accuracy of one second, that is the number of seconds since 00:00:00 UTC, 1970-01-01. Note: Timestamp where client attaches to the stream queue is not message time timestamp

I feel this should clear the confusion once for all. (I mean until next time implementation changes ;) )

@kjnilsson
Copy link
Contributor

I agree we should clarify that the timestamp used is the arrival time not any other use provided timestamp.

I'm not sure the suggested paragraph makes this particularly clear however.

How about something like:

Timestamp - a timestamp value specifying the approximate point in time at which to attach to the stream. RabbitMQ streams record the arrival time of each message and it to determine the attach offset, not any other timestamp provided in the message itself (RabbitMQ stream do not interpret the messages at all). It will clamp to the closest preceding offset, if the timestamp is out of range for the stream it will clamp to either the start or end of the stream respectively. With AMQP 0.9.1, the timestamp used is POSIX time with an accuracy of one second, that is the number of seconds since 00:00:00 UTC, 1970-01-01. Given the above behaviour, consumers can receive messages published a bit before the specified timestamp.

The implementation for time based stream attachment hasn't changed AFAIK and is unlikely to ever change as there is no reasonable alternative that I can see.

@Vic152
Copy link
Author

Vic152 commented Feb 2, 2023

"RabbitMQ streams record the arrival time of each message and it to determine the attach offset" <-- something missing here

@kjnilsson I modified my message based on your suggestions. I think this description should be blunt. I mention chunk to make sure it's known that chunk is dynamic and unpredictable - so do not base your logic on it. The gist is: you will get some messages before the timestamp you specified and you should filter.

Timestamp - a timestamp value specifying an approximate point in time at which to attach to the stream. It will clamp to the closest offset, if the timestamp is out of range for the stream it will clamp either the start or end of the log respectively. The chunk size is dynamic and based on the message ingress rate. Be aware consumers can receive messages published a bit before the specified timestamp. Applications consuming RabbitMq streams should apply filter to ingest messages of interest. With AMQP 0.9.1, the timestamp used is POSIX time with an accuracy of one second, that is the number of seconds since 00:00:00 UTC, 1970-01-01. Note: Timestamp where client attaches to the stream queue is not message arrival time timestamp or any other timestamp provided in the message itself (RabbitMQ stream does not interpret the messages at all).

@acogoluegnes
Copy link
Contributor

I'd remove "The chunk size is dynamic and based on the message ingress rate." above.

@Vic152
Copy link
Author

Vic152 commented Feb 2, 2023

I'd remove "The chunk size is dynamic and based on the message ingress rate." above.

Yeah I have mixed feelings about it too. It's internal detail, you are probably right.

@michaelklishin
Copy link
Member

@Vic152 may I ask what was the exact problem that "sending at midnight" presented? It's difficult for us to edit without understanding what part was confusing and why.

@Vic152
Copy link
Author

Vic152 commented Feb 2, 2023

@Vic152 may I ask what was the exact problem that "sending at midnight" presented? It's difficult for us to edit without understanding what part was confusing and why.

Yeah this does not matter that much. Say 00:00 UTC. We replay messages from 00:00UTC of every day. It does not matter.

Attach at any point of time but you do not really. Then you attach to start of a chunk that is of uncertain size/message count. However, you are guaranteed to attach to stream before your timestamp. Our problem is that this functionality worked a bit random until we understood that it does not attach to the message but the chunk but chunks are dynamic.

The behaviour of streams is not what one would expect. If I say I want to attach at 00:00 I want to attach at 00:00, not a bit before, even worse, an uncertain bit before that is highly dynamic. Of course this is done like this to address general case but not obvious.

Aim to edit doc is to stress and make it unequivocally clear that you do not attach to particular message arrival timestamp but some arbitrary, unknown time before. That it is responsibility of your application to filter for exact time.

Also, general suggestion - with my technical writer hat on - use short sentences. Avoid commas if possible.

@michaelklishin
Copy link
Member

@Vic152 so it's the approximation due to chunked data transfer that was confusing. Thanks.

@Vic152
Copy link
Author

Vic152 commented Feb 2, 2023

@Vic152 so it's the approximation due to chunked data transfer that was confusing. Thanks.

That. It would be nice to include guidance in the documentation about how to deal with the behaviour. It would save time if the users were told they need to filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants