Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenmeta dstore needs to re-query dns on s3 failure #182

Open
matthewdarwin opened this issue Nov 3, 2020 · 5 comments
Open

tokenmeta dstore needs to re-query dns on s3 failure #182

matthewdarwin opened this issue Nov 3, 2020 · 5 comments

Comments

@matthewdarwin
Copy link

tokenmeta keeps trying to connect to and s3 endpoint where the IP address changed (DNS is the same). So the same request fails over and over and over.

(tokenmeta) starting run loop (bstream/eternalsource.go:103)
(tokenmeta) new live joining source (tokenmeta/pipeline.go:31){"start_block": "#150221099 (08f4312b9cd3d38ed1f859f4b2d68e3b6bd600368e20714b3d24e9ce902884e1)"}
(tokenmeta.js) creating new joining source (bstream/joiningsource.go:90)
(tokenmeta.js) Joining Source is now running (bstream/joiningsource.go:175)
(tokenmeta.js) joining state JOINING (bstream/joiningsource.go:541){"block_behind_live": 0, "last_file_block": 0, "last_live_block": 0, "last_merger_block": 0, "buffer_lower_block": 0, "buffer_higher_block": 0}
(tokenmeta.js.live) starting block source consumption (blockstream/source.go:152)
(tokenmeta.js.live) block stream source reading messages (blockstream/source.go:160)
(tokenmeta.js.live) source shutting down (blockstream/source.go:154){"error": "file source failed: reading file existence: ServiceUnavailable: Service Unavailable\n\tstatus code: 503, request id: tx000000000000000041387-005fa15eb3-bb684e-default, host id: "}
(tokenmeta) eternal source failed (bstream/eternalsource.go:125){"error": "file source failed: reading file existence: ServiceUnavailable: Service Unavailable\n\tstatus code: 503, request id: tx000000000000000041387-005fa15eb3-bb684e-default, host id: "}
(tokenmeta) sleeping before restarting underlying source (bstream/eternalsource.go:128){"wait_time": "2s"}
(tokenmeta) starting run loop (bstream/eternalsource.go:103)
(tokenmeta) new live joining source (tokenmeta/pipeline.go:31){"start_block": "#150221099 (08f4312b9cd3d38ed1f859f4b2d68e3b6bd600368e20714b3d24e9ce902884e1)"}
(tokenmeta.js) creating new joining source (bstream/joiningsource.go:90)
(tokenmeta.js) Joining Source is now running (bstream/joiningsource.go:175)
(tokenmeta.js.live) starting block source consumption (blockstream/source.go:152)
(tokenmeta.js.live) block stream source reading messages (blockstream/source.go:160)
(tokenmeta.js) joining state JOINING (bstream/joiningsource.go:541){"block_behind_live": 150529436, "last_file_block": 0, "last_live_block": 150529436, "last_merger_block": 0, "buffer_lower_block": 150529233, "buffer_higher_block": 150529436}
(tokenmeta) eternal source failed (bstream/eternalsource.go:125){"error": "file source failed: reading file existence: ServiceUnavailable: Service Unavailable\n\tstatus code: 503, request id: tx00000000000000004138b-005fa15eba-bb684e-default, host id: "}
(tokenmeta) sleeping before restarting underlying source (bstream/eternalsource.go:128){"wait_time": "2s"}
(tokenmeta.js.live) source shutting down (blockstream/source.go:154){"error": "file source failed: reading file existence: ServiceUnavailable: Service Unavailable\n\tstatus code: 503, request id: tx00000000000000004138b-005fa15eba-bb684e-default, host id: "}
@matthewdarwin
Copy link
Author

In this case the s3 server is sending back HTTP error 503.

@matthewdarwin
Copy link
Author

mindreader and merger work fine. other components seem to not.

@sduchesneau
Copy link
Contributor

I can't reproduce by simply closing the endpoints:

local setup:

  • minio(1) listening on 127.0.1.1
  • minio(2) listening on 127.0.2.2

dns ceph.stepd A record pointing to both .1.1 and .2.2

a) if I close .1.1, I don't get a failure.
b) if I also close .2.2 I get failures in a loop (every 2 seconds) on tokenmeta:

2020-11-04T09:54:24.890-0500 (tokenmeta.js.live) source shutting down (blockstream/source.go:154) {"error": "file source failed: reading file existence: RequestError: send request failed\ncaused by: Head \"http://ceph.stepd:9000/blocks/0000000700.dbin.zst\": dial tcp 127.0.1.1:9000: connect: connection refused"}
...

c) I edit the DNS entry and add 127.0.3.3 to ceph.stepd, then I start a minio instance listening only on that new address, and see blockmeta successfully passing over that 'reading file existence' step, printing the next log line: reading from blocks store: file does not (yet?) exist, retrying in (bstream/filesource.go:139) {"filename": "0000000700.dbin.zst", "base_filename": "0000000700", "retry_delay": "4s"}

@sduchesneau
Copy link
Contributor

so locally, when I set common blocks folder like this:
s3://ceph.stepd:9000/oneblocks?region=none&insecure=true&access_key_id=minio&secret_access_key=miniostorage

tokenmeta will successfully switch to a new endpoint (added in DNS) in case of failure...
I see the same behavior inside the relayer (it successfully connects to the next endpoint added in DNS -> relayer uses the same 'eternal joining source' pattern from the same library...

@matthewdarwin
Copy link
Author

I am curious what kind of failure you tested. In my test the connection was still active, but returning 503 errors. That might be different than killing the entire service so the TCP connection breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants