Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v03 heads up. #9

Open
petersilva opened this issue Jan 12, 2020 · 17 comments
Open

v03 heads up. #9

petersilva opened this issue Jan 12, 2020 · 17 comments

Comments

@petersilva
Copy link

just to let you know, for the past year or so, we have been working on a new message payload format, as a result of limitations in the current ones and feed back from some international consultations. The current protocol version is identified by using the topic tree that starts with v02.post. Over the next year or two, we may migrate to v03.post. Differences in messages:

  • AMQP headers are no longer used to store key-value pairs. Instead, the message body is a JSON array. As a result, the formerly anonymous fields in the body of a v02 message are now key value pairs in the array: pubTime, baseUrl, and relPath.

  • for fields that are encoded, such as checksums, the encoding is changed to base64 (more compact representation)

https://github.com/MetPX/sarracenia/blob/master/doc/sr_postv3.7.rst

might still evolve slightly (new fields?) but we have done some important deployments of v03, and it is looking solid. no fire... nothing will be sprung on consumers suddenly, we haven´t looked at any migration strategy yet, but would not want to spring it on clients all of a sudden. Figured you would want to know far ahead of time.

I can supply some alternate data streams if you want a sample.

@ghost
Copy link

ghost commented Jan 16, 2020

Thank you! Is v03.post available for use right away, or should we simply get a branch ready for testing in the future?

@ghost ghost self-assigned this Jan 16, 2020
@ghost
Copy link

ghost commented Jan 16, 2020

In theory we can release this such that v03 is supported behind a flag to start, and eventually that flag becomes the default and finally that v02 is dropped.

@ghost
Copy link

ghost commented Jan 16, 2020

I actually think I have most of this put together already. One question: what is the contentType of the new message format? (I assume it is application/json or application/json; charset=utf-8 or the like, but would like to confirm.)

I have a listener running on v03.post.# right now and I think all of my questions will be answered once I see a message come through :)

@petersilva
Copy link
Author

if you connect to:

broker amqps://anonymous:anonymous@hpfx.collab.science.gc.ca
exchange xs_pas037_wmosketch_public
topic v03.post.#

You can get a sample v03.post feed. You can use it to confirm that v03 works, but we wont be posting anywhere else for a while (until v03 is fully gelled.) note that that feed includes embedding, which is a significant change from before.

This feed is extremely experimental and may change at any time. it is being used to work with colleagues in the WMO to develop next gen WMO data exchange protocols.

( https://github.com/MetPX/wmo_mesh )

@petersilva
Copy link
Author

oh... the content type? In the C version we explitly set text/plain, in the python one there is no explicit setting, I'm guessing text/plain is the default, so basicaly we aren't using it. Do you think we should use a more specific content-type?

@petersilva
Copy link
Author

note:

https://stackoverflow.com/questions/477816/what-is-the-correct-json-content-type

so if we went that way, it would be application/json ...

@petersilva
Copy link
Author

petersilva commented Jan 17, 2020

another question... this protocol is fairly modern, so it is assumed as utf-8 in the spec. JSON is often utf-8. UTF-8 is kind of natural these days as a default charset (already the default on HTML5) so I don't think it is necessary to specify, it should be the default, and if someone wants to use something else, they should be the ones to use charset. I'm thinking about this in the context that I send millions of messages per day, so adding charset adds megabytes (12 bytes per message) of traffic per day, for no real benefit. but the question of content-type... what would be the benefit of appplication/json ?

@ghost
Copy link

ghost commented Jan 17, 2020

Thanks for the test feed—I'll point to it and try to capture a message!

Regarding content type: if the message body is JSON, I'd say it's a best practice to set application/json. (This is mostly because it keeps the protocol as intuitive as possible. In our case, it's also handy, since it lets us use the same code to handle v02 and v03, since they return different message formats :) )

That said, if you're not using the field, I'll update our code to ignore it.

Noting the charset in the content type is by no means required and if you're concerned about bandwidth per message, omitting it is reasonable (especially for us-ascii, ISO-8859-1/windows-1252, or utf-8). (In fact, if the number of "why am I getting JSON parse errors" emails to our developer support mailbox is any indication, a lot of people don't pay attention to the content type even when told to explicitly!)

@ghost
Copy link

ghost commented Jan 17, 2020

I haven't seen any messages pushed yet, so I can't verify, but in any case we have an experimental ironwallaby/v03 branch which should work.

@petersilva
Copy link
Author

oops... the feed was down. it is back up now.

@petersilva
Copy link
Author

OK, changed the content_type to application/json in master, will take a few weeks to get into a release, and perhaps a few months to get to production. At some point the messages will just start showing up with the right content_type.

@ghost
Copy link

ghost commented Jan 17, 2020

Thank you! My branch seems to work now. The only issue I ran into was that the time format changed subtly (the addition of a T character to delimit dates and times). This was easy to fix, of course, but the change was so subtle that I didn't notice it in my scan of the documentation.

@petersilva
Copy link
Author

great!

@petersilva
Copy link
Author

what would be the protocol at this point, should I close the issue?

@ghost
Copy link

ghost commented Jan 17, 2020

I still need to finish up and merge in my prototype, but I'll go ahead and close the issue once I've done so.

Would you mind opening a new issue once v03 is live (if ever)?

@petersilva
Copy link
Author

There will certainly be an announcement on the datamart mailing list, and a period of parallel access (both versions available for a month or two) so you will certainly hear about it. The idea of the heads up is to minimize the length of the paralle period.

@petersilva
Copy link
Author

update... why the heck didn't we release this three years ago? I spent a few years working with colleagues at the World Meteorological Organization, hoping to be able to merge v03 format with what they hoped to produced for pub/sub. It kept sounding like "yes, but..." and tweaks being needed here and there, but in the end, last spring they rejected it wholesale, preferring something more web service oriented, which has some conflicts with file transfer that is the focus of sarracenia.

After the split, the format has continued to evolve over the past nine months in the following way: for the high performance mirroring use case, we need to transport things other than files: file removal events, directory creation, symbolic links, renames... those were formerly encoded in a conceptual overload of the checksum field, but in versions of v03 since fall 2022, are now represented using a "fileOp" field. the format is shown here:

https://metpx.github.io/sarracenia/Reference/sr_post.7.html

There is also a likely removal of optional fields coming: from_cluster and to_clusters will likely be elided, as they have not proven useful in deployments so far.

The format will be the default for sr3... a version which has been gradually working towards a stable release for the past year, looking close.

@ghost ghost removed their assignment Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant