Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set delivery_mode=2 (peristent) in v2 #1004

Open
reidsunderland opened this issue Mar 28, 2024 · 2 comments
Open

Set delivery_mode=2 (peristent) in v2 #1004

reidsunderland opened this issue Mar 28, 2024 · 2 comments
Assignees
Labels
likely-fixed likely fix is in the repository, success not confirmed yet. Priority 4 - Strategic would benefit multiple use cases if resolved ReliabilityRecovery improve behaviour in failure situations. v2only only affects v2 branches.

Comments

@reidsunderland
Copy link
Member

We recently discovered that v2 does not publish peristent messages:

sarracenia/sarra/sr_amqp.py

Lines 388 to 416 in 5b063af

if self.hc.use_amqp:
self.logger.debug("publish AMQP is used")
if mexp:
expms = '%s' % mexp
msg = amqp.Message(message, content_type=ct, application_headers=mheaders,
expiration=expms)
else:
msg = amqp.Message(message, content_type=ct, application_headers=mheaders)
self.channel.basic_publish(msg, exchange_name, exchange_key)
self.channel.tx_commit()
elif self.hc.use_amqplib:
self.logger.debug("publish AMQPLIB is used")
if mexp:
expms = '%s' % mexp
msg = amqplib_0_8.Message(message, content_type=ct, application_headers=mheaders,
expiration=expms)
else:
msg = amqplib_0_8.Message(message, content_type=ct, application_headers=mheaders)
self.channel.basic_publish(msg, exchange_name, exchange_key)
self.channel.tx_commit()
elif self.hc.use_pika:
self.logger.debug("publish PIKA is used")
if mexp:
expms = '%s' % mexp
properties = pika.BasicProperties(content_type=ct, delivery_mode=1, headers=mheaders,
expiration=expms)
else:
properties = pika.BasicProperties(content_type=ct, delivery_mode=1, headers=mheaders)
self.channel.basic_publish(exchange_name, exchange_key, message, properties, True)

(delivery_mode is either not set, or set to 1, which is transient).

This means that messages are lost when a broker is restarted.

In sr3, we use delivery_mode=2, persistent.

AMQP_Message = amqp.Message(raw_body,
content_type=content_type,
application_headers=headers,
expire=ttl,
delivery_mode=2)

@reidsunderland reidsunderland added v2only only affects v2 branches. Priority 4 - Strategic would benefit multiple use cases if resolved labels Mar 28, 2024
@reidsunderland reidsunderland self-assigned this Mar 28, 2024
@reidsunderland
Copy link
Member Author

With the change made, the v2 flakey_broker test passes, other than sarrac problems:

                TEST RESULTS

                 | content of subdirs of /net/local/home/sunderlandr/sarra_devdocroot |
test 1 success: compare contents of downloaded_by_sub_amqp and downloaded_by_sub_cp are the same
test 2 success: compare contents of downloaded_by_sub_cp and downloaded_by_sub_rabbitmqtt are the same
test 3 success: compare contents of downloaded_by_sub_rabbitmqtt and downloaded_by_sub_u are the same
test 4 success: compare contents of downloaded_by_sub_u and posted_by_shim are the same
test 5 success: compare contents of posted_by_shim and sent_by_tsource2send are the same
test 6 success: compare contents of cfile and cfr are the same
broker state:
                 | dd.weather routing |
test  7 FAILURE: no successful results! expected 2242 number of extended attributes in source tree 0
test  8 success: sr_post         count of posted files (1111) should be same those in the static data directory  (1111)
test  9 success: sr_post         count of rejected files (10) should be same those in the static data directory  (10)
test 10 success: sr_post         (1111) t_dd1 should have the same number of items as t_dd2      (1111)
test 11 success: sr_sarra        (1111) should have the same number of items as one post         (1111)
test 12 success: sr_sarra        (1111) should winnow the same number of items as one post       (1111)
test 13 success: sr_subscribe amqp_f30   (1111) should have the same number of items as sarra            (1111)
                 | watch      routing |
test 14 success: sr_watch                (1111) should be the same as subscribe amqp_f30                  (1111)
test 15 success: sr_sender               (1112) should have the same number of items as sr_watch  (1111)
test 16 success: rabbitmqtt              (1111) should have the same number of items as sr_watch  (1111)
test 17 success: sr_subscribe u_sftp_f60 (1113) should have the same number of items as sr_sender (1112)
test 18 success: sr_subscribe cp_f61     (1113) should have the same number of items as sr_sender (1112)
                 | poll       routing |
 poll sftp_f62 posted 1111  sftp_f63 posted 0
test 19 success: sr_poll sftp_f62+3      (1111) should have the same number of items of sr_sender        (1112)
test 20 success: sr_subscribe q_f71      (1111) should have the same number of items as sr_poll sftp_f62+3 (1111)
                 | flow_post  routing |
test 21 success: sr_post test2_f61       (1111) should have the same number of files of sr_sender        (1112)
test 22 success: sr_subscribe ftp_f70    (1111) should have the same number of items as sr_post test2_f61 (1111)
test 23 FAILURE: no successful results, 2nd item! sr_post test2_f61      (1111) should have about half the number of items as shim_f63       (0)
                 | py infos   routing |
test 24 success: 0 messages received that we don't know what happened.
                 | C          routing |
test 25 FAILURE: no successful results! cpost both pelles should post the same number of messages (0) (0)
test 26 success: cpost both pelles should see same amount of post_rate_limit messages (219) (218)
test 27 FAILURE: no successful results! cpost pelle04 should post 5 times the number of post_rate_limit messages (0) (219)
test 28 FAILURE: no successful results! cdnld_f21 subscribe downloaded (1111) the same number of files that was published by both van_14 and van_15 (0)
test 29 success: veille_f34 should post as many files (1111) as subscribe cdnld_f21 downloaded (1111)
test 30 success: veille_f34 should post as many files (1111) as subscribe cfile_f44 downloaded (1111)
test 31 success: 0 there should be no unacknowledged messages left, but there are 0
test 32 success: 0 there should be no messages ready to be consumed but there are 0
test 33 FAILURE: Overall flakey_broker 27 of 32 passed (sample size: 1111) !

compared to v2_dev without the persistent delivery mode:

                TEST RESULTS

                 | content of subdirs of /net/local/home/sunderlandr/sarra_devdocroot |
test 1 FAILURE: compare contents of downloaded_by_sub_amqp and downloaded_by_sub_cp differ
test 2 success: compare contents of downloaded_by_sub_cp and downloaded_by_sub_rabbitmqtt are the same
test 3 success: compare contents of downloaded_by_sub_rabbitmqtt and downloaded_by_sub_u are the same
test 4 success: compare contents of downloaded_by_sub_u and posted_by_shim are the same
test 5 success: compare contents of posted_by_shim and sent_by_tsource2send are the same
test 6 success: compare contents of cfile and cfr are the same
broker state:
                 | dd.weather routing |
test  7 FAILURE: no successful results! expected 2242 number of extended attributes in source tree 0
test  8 success: sr_post         count of posted files (1111) should be same those in the static data directory  (1111)
test  9 success: sr_post         count of rejected files (10) should be same those in the static data directory  (10)
test 10 success: sr_post         (1111) t_dd1 should have the same number of items as t_dd2      (1111)
test 11 success: sr_sarra        (1105) should have the same number of items as one post         (1111)
test 12 success: sr_sarra        (1105) should winnow the same number of items as one post       (1111)
test 13 success: sr_subscribe amqp_f30   (1092) should have the same number of items as sarra            (1105)
                 | watch      routing |
test 14 success: sr_watch                (1092) should be the same as subscribe amqp_f30                  (1092)
test 15 success: sr_sender               (1025) should have the same number of items as sr_watch  (1092)
test 16 success: rabbitmqtt              (1025) should have the same number of items as sr_watch  (1092)
test 17 success: sr_subscribe u_sftp_f60 (1025) should have the same number of items as sr_sender (1025)
test 18 success: sr_subscribe cp_f61     (1025) should have the same number of items as sr_sender (1025)
                 | poll       routing |
 poll sftp_f62 posted 1025  sftp_f63 posted 0
test 19 success: sr_poll sftp_f62+3      (1025) should have the same number of items of sr_sender        (1025)
test 20 success: sr_subscribe q_f71      (1025) should have the same number of items as sr_poll sftp_f62+3 (1025)
                 | flow_post  routing |
test 21 success: sr_post test2_f61       (1025) should have the same number of files of sr_sender        (1025)
test 22 success: sr_subscribe ftp_f70    (1025) should have the same number of items as sr_post test2_f61 (1025)
test 23 FAILURE: no successful results, 2nd item! sr_post test2_f61      (1025) should have about half the number of items as shim_f63       (0)
                 | py infos   routing |
test 24 success: 0 messages received that we don't know what happened.
                 | C          routing |
test 25 FAILURE: no successful results! cpost both pelles should post the same number of messages (0) (0)
test 26 success: cpost both pelles should see same amount of post_rate_limit messages (219) (219)
test 27 FAILURE: no successful results! cpost pelle04 should post 5 times the number of post_rate_limit messages (0) (219)
test 28 FAILURE: no successful results! cdnld_f21 subscribe downloaded (1111) the same number of files that was published by both van_14 and van_15 (0)
test 29 success: veille_f34 should post as many files (1111) as subscribe cdnld_f21 downloaded (1111)
test 30 success: veille_f34 should post as many files (1111) as subscribe cfile_f44 downloaded (1111)
test 31 success: 0 there should be no unacknowledged messages left, but there are 0
test 32 success: 0 there should be no messages ready to be consumed but there are 0
test 33 FAILURE: Overall flakey_broker 26 of 32 passed (sample size: 1111) !

I'm guessing the sarrac problems are related to changes in sr_insects, not actual problems...

@petersilva petersilva added the likely-fixed likely fix is in the repository, success not confirmed yet. label Mar 29, 2024
@petersilva
Copy link
Contributor

fwiw... there is an sr_insects PR that updates the flow tests.
They all pass for v2 on my development box now.
MetPX/sr_insects#39

@petersilva petersilva added the ReliabilityRecovery improve behaviour in failure situations. label Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
likely-fixed likely fix is in the repository, success not confirmed yet. Priority 4 - Strategic would benefit multiple use cases if resolved ReliabilityRecovery improve behaviour in failure situations. v2only only affects v2 branches.
Projects
None yet
Development

No branches or pull requests

2 participants