Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OffsetOutOfRange errors have returned #2951

Open
bobvandevijver opened this issue Apr 10, 2024 · 12 comments
Open

OffsetOutOfRange errors have returned #2951

bobvandevijver opened this issue Apr 10, 2024 · 12 comments

Comments

@bobvandevijver
Copy link

Self-Hosted Version

24.3.0

CPU Architecture

x86_64

Docker Version

26.0.0 / 24.0.7

Docker Compose Version

2.25.0 / 2.21.1

Steps to Reproduce

The OffsetOutOfRange errors (discussed before in #1894) have spontaneously returned on 2 out of my 3 self-hosted installations. This is mostly visible due to the alert no longer being executed.

For one of them, I removed the kafka and zookeeper volumes last week to solve the issue, but it seems that it was only temporary as the errors have returned. The other one only catched my attention just now.

As this might be related to #2931 and #2876, I will remove the kafka and zookeeper volumes now again, and replace the rust-consumers with consumer.

I'm also seeing getsentry/snuba#5707 on the other instance, so I will be changed that to the non-rust consumers there as well.

Expected Result

Well, no errors, and events being processed correctly 😄

Actual Result

sentry-self-hosted-post-process-forwarder-errors-1                 | 11:17:54 [INFO] arroyo.processing.processor: Processor terminated
sentry-self-hosted-post-process-forwarder-transactions-1           | 11:17:54 [INFO] arroyo.processing.processor: New partitions assigned: {Partition(topic=Topic(name='transactions'), index=0): 0}
sentry-self-hosted-post-process-forwarder-transactions-1           | 11:17:54 [INFO] sentry.post_process_forwarder.post_process_forwarder: Starting multithreaded post process forwarder
sentry-self-hosted-post-process-forwarder-errors-1                 | Traceback (most recent call last):
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/bin/sentry", line 8, in <module>
sentry-self-hosted-post-process-forwarder-errors-1                 |     sys.exit(main())
sentry-self-hosted-post-process-forwarder-errors-1                 |              ^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/sentry/runner/__init__.py", line 190, in main
sentry-self-hosted-post-process-forwarder-errors-1                 |     func(**kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
sentry-self-hosted-post-process-forwarder-errors-1                 |     return self.main(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
sentry-self-hosted-post-process-forwarder-errors-1                 |     rv = self.invoke(ctx)
sentry-self-hosted-post-process-forwarder-errors-1                 |          ^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
sentry-self-hosted-post-process-forwarder-errors-1                 |     return _process_result(sub_ctx.command.invoke(sub_ctx))
sentry-self-hosted-post-process-forwarder-errors-1                 |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
sentry-self-hosted-post-process-forwarder-errors-1                 |     return _process_result(sub_ctx.command.invoke(sub_ctx))
sentry-self-hosted-post-process-forwarder-errors-1                 |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
sentry-self-hosted-post-process-forwarder-errors-1                 |     return ctx.invoke(self.callback, **ctx.params)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
sentry-self-hosted-post-process-forwarder-errors-1                 |     return __callback(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
sentry-self-hosted-post-process-forwarder-errors-1                 |     return f(get_current_context(), *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 69, in inner
sentry-self-hosted-post-process-forwarder-errors-1                 |     return ctx.invoke(f, *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
sentry-self-hosted-post-process-forwarder-errors-1                 |     return __callback(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
sentry-self-hosted-post-process-forwarder-errors-1                 |     return f(get_current_context(), *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 29, in inner
sentry-self-hosted-post-process-forwarder-errors-1                 |     return ctx.invoke(f, *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
sentry-self-hosted-post-process-forwarder-errors-1                 |     return __callback(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1                 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/sentry/runner/commands/run.py", line 448, in basic_consumer
sentry-self-hosted-post-process-forwarder-errors-1                 |     run_processor_with_signals(processor, consumer_name)
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/sentry/utils/kafka.py", line 46, in run_processor_with_signals
sentry-self-hosted-post-process-forwarder-errors-1                 |     processor.run()
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
sentry-self-hosted-post-process-forwarder-errors-1                 |     self._run_once()
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 384, in _run_once
sentry-self-hosted-post-process-forwarder-errors-1                 |     self.__message = self.__consumer.poll(timeout=1.0)
sentry-self-hosted-post-process-forwarder-errors-1                 |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 235, in poll
sentry-self-hosted-post-process-forwarder-errors-1                 |     message = self.__consumer.poll(timeout)
sentry-self-hosted-post-process-forwarder-errors-1                 |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1                 |   File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/consumer.py", line 414, in poll
sentry-self-hosted-post-process-forwarder-errors-1                 |     raise OffsetOutOfRange(str(error))
sentry-self-hosted-post-process-forwarder-errors-1                 | arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}
sentry-self-hosted-post-process-forwarder-errors-1 exited with code 0

Event ID

No response

@hostalp
Copy link

hostalp commented Apr 10, 2024

Same here. So far the same consumer group & topic it seems.
Consumer group: post-process-forwarder
Topic: events

@aldy505
Copy link
Collaborator

aldy505 commented Apr 11, 2024

I'm sorry I can't hold this back

image

Jokes aside, I can't reproduce this on my end since I don't use Kafka anymore (I replaced it with Redpanda and I got no errors like this). Does this command still works?

sudo docker compose down && \ # We shutdown everything, but we only want to keep Kafka running
sudo docker compose up -d --wait kafka && \
sudo docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group post-process-forwarder --delete && \  # Delete the post-process-forwarder consumer group
sudo docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group post-process-forwarder --topic events --delete-offsets && \ # Delete events topic offsets from consumer group named post-process-forwarder on Kafka
sudo docker compose up -d # To start everything again

Let me know if that works.

@hostalp
Copy link

hostalp commented Apr 11, 2024

@aldy505 Well, that approach could work too, however what I do in these cases is just a simple offset reset:

docker compose down -v
docker compose --env-file .env.custom up -d kafka
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --reset-offsets --to-latest --execute --group post-process-forwarder --topic events
docker compose --env-file .env.custom up -d

or you can "optimistically" reset all of them:

docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --reset-offsets --to-latest --execute --all-groups --all-topics

@hubertdeng123
Copy link
Member

Unsure why this is only happening for post-process-forwarder, since I don't think that was converted to rust-consumer. Wondering if we are missing a --no-strict-offset-reset on the post-process-forwarder containers. Could you try adding that in the docker compose file? @hostalp @bobvandevijver

@erfantkerfan
Copy link
Contributor

+1

@bobvandevijver
Copy link
Author

It looks like that reverting to the non-rust consumers has fixed it for now: I haven't seen the offset issue return since when I created the ticket and removed the kafka volumes.

@hostalp
Copy link

hostalp commented Apr 15, 2024

I've on the other hand set the suggested --no-strict-offset-reset flag on all 3 post-process-forwarder consumers, however it may take days or even weeks to find out whether it really helped.

@magnuslarsen
Copy link

magnuslarsen commented Apr 16, 2024

I also reverted to the non-rust consumers, but today the 3 post-process-forwarder consumers failed again with OffsetOutOfRange errors

I will try to go back to the rust-consumers now, and add --no-strict-offset-reset the the post-process-forwarders

Does this command still works [...] Let me know if that works.

It works :-)

@magnuslarsen
Copy link

I've since added --no-strict-offset-reset had no crashes (including the post-process-forwarders), which before adding the option crashed every 5 days or so

For me, this seems to have successfully fixed the issue, with seemingly no side effects :-)

@hostalp
Copy link

hostalp commented Apr 29, 2024

I concur.

@aldy505
Copy link
Collaborator

aldy505 commented Apr 29, 2024

@hubertdeng123 @azaslavsky Do you think it's safe to put the --no-strict-offset-reset on some of the containers that don't have it by default (as in, hardcoded on the docker-compose.yml)? Can you validate that out with the code owners on Slack? Thanks!

@bobvandevijver
Copy link
Author

Note that I did not add the --no-strict-offset-reset option, I only switched to the non-rust consumers. And the error hasn't returned since for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Status: No status
Development

No branches or pull requests

6 participants