Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider storing 'time' using 'timestamp with time zone' #20

Open
nlochschmidt opened this issue Jan 23, 2021 · 5 comments
Open

Consider storing 'time' using 'timestamp with time zone' #20

nlochschmidt opened this issue Jan 23, 2021 · 5 comments

Comments

@nlochschmidt
Copy link

MessageDB currently relies on timestamp without timezone for the time column. Depending on timezone settings in the database / client, this means that doing calculations in the database with other columns that use timestamp with time zone is doable but more complicated and error-prone than it should be.

From the PostgreSQL Wiki

Don't use timestamp (without time zone) to store UTC times

Storing UTC values in a timestamp without time zone column is, unfortunately, a practice commonly inherited from other databases that lack usable timezone support.
Use timestamp with time zone instead.

Why not?

Because there is no way for the database to know that UTC is the intended timezone for the column values.
This complicates many otherwise useful time calculations.

I realize that this is a breaking change, and of course I was able to simply change it in my own installation, but I think you should at least consider consider switching to the safer type as well.

@sbellware
Copy link
Contributor

Hi, @nlochschmidt. Sorry for the late reply. Things have been busy lately.

The time column should never be used for applicative purposes. It's only there as operational, mechanical reference data. If the time column is being used for calculations with other timestamp columns, then it's effectively being used beyond its intended use, and ultimately outside of it safety envelope.

If a message is to bear any application-specific time data, it should be in the JSON payload stored in the data attribute.

@nlochschmidt
Copy link
Author

Good to know, however it is part of the response of the get_stream_messages and therefore part of the public API, right?

Simply doing technical queries on the time returned can cause issues e.g. to figure out the count of events in certain time intervals 🤷‍♂️

If anything, this issue might help others stumbling upon timestamp issues.

@sbellware
Copy link
Contributor

sbellware commented Feb 3, 2021

Indeed, it is part of the public API. It's exposed ultimately because it shouldn't be obscured. There may be diagnostic use cases for that data. But even in those cases, the values from one message store would only be useful in comparison against other messages from the same message store. So, the data is provided because there's no good reason to obscure it.

But again, that data should not be used for any applicative reasons. Counting messages rows within a certain time interval is a different concern than counting messages within a certain time interval. The first is a mechanical concern and the second is an applicative concern.

In the applicative use case, a time attribute inside the message data JSON object is the one that's pertinent. Otherwise, it would be the performance of Postgres and infrastructure that would be being measured, and there are more appropriate tools for that kind of measurement and monitoring.

In the past, users have built services to detect when an input message and an output message fall outside of a given time threshold. This kind of thing is a measurement of business activity, rather than database/network/hardware performance. That measurement should always be done based on the applicative timestamps that are contained within a message's data attribute, which reflects the real time of a business transaction, rather than reflecting an infrastructural concern.

However, even if the time attribute on the messages row itself was used for some reason, it's still largely safe to compare that time to other rows in the table. Such a use case would typically be interested in the interval between two time points with the same base reference point, and timezone offsets would be largely superfluous.

Taking timestamp on the message row beyond that envelope wouldn't be advisable. It's not advisable, but we're not going to stand in the way. But it's outside of intended and supported use cases, and starts to suggest the data aggregation and data transformation use cases which are the other half of the whole of the architecture.

We may consider this change for a future version, but at present it invites uses and couplings that are largely inconsistent with message store applications.

@nlochschmidt
Copy link
Author

nlochschmidt commented Feb 3, 2021

I've raised the issue mainly because I have often seen the misunderstanding that timestamp with time zone stores a time zone 😬 or that timestamp without time zone aka timestamp is a point in time where it is actually more like "the picture of a clock on the wall" 🤷‍♂️.

I don't agree that it is a good idea to use a type that is easy to misuse, in order to detract certain uses or couplings. Maybe bigint would have been the better choice in that case.

Anyways I have big respect for what you have done with message-db and am looking forward to hopefully use ideas in here for a future project. Thanks a lot 🙏

@sbellware
Copy link
Contributor

The global_position column is that bigint that acts as a vector clock. The time attribute is there as a human-readable value that supports interactive uses and verification. Its definition isn't a misunderstanding. It's quite intentional. The time column is there as contextual information, and its data type should discourage coupling to it for programatic or applicative purposes.

Let us know if there's anything you need with your future projects with Message DB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants