Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default consumer offset reset policy - earliest #73

Open
srolija opened this issue Jun 1, 2021 · 3 comments
Open

Default consumer offset reset policy - earliest #73

srolija opened this issue Jun 1, 2021 · 3 comments

Comments

@srolija
Copy link

srolija commented Jun 1, 2021

By default Kafka Consumer has auto.offset.reset policy configured to latest. But it looks that the implementation in this connector is reverse -- unless it is configured it will start with earliest.

String offsetReset = (String) consumerProperties.get("auto.offset.reset");
if ("latest".equalsIgnoreCase(offsetReset)) {
logger.trace("Seeking to end");
consumer.seekToEnd(Collections.singletonList(tp));
} else {
logger.trace("Seeking to beginning");
consumer.seekToBeginning(Collections.singletonList(tp));

And that value is extracted from the consumer prefixed ones:

return simpleConfig.originalsWithPrefix("consumer.");

Would it make sense to make the default the same as the normal connector to keep it consistent with normal consumer groups?

@pdavidson100
Copy link
Contributor

Thanks for your suggestion @srolija! The Mirus Source Connector is primarily intended for high-reliability data replication use cases, so our defaults are selected to minimize the risk of data loss at all costs — even at expense of increased risk of duplicate data. By using the earliest policy we guarantee that all available data is replicated, even in exceptional circumstances. This policy covers us when new topics are added to the topic regex and, importantly, also in rare instances where a Kafka bug causes the current offset to become invalid and the consumer offsets are reinitialized. This is something we have occasionally seen, particularly in earlier Kafka releases and using latest in that situation would certainly lead to data loss.

@srolija
Copy link
Author

srolija commented Jun 2, 2021

Completely get it.

Would it then make sense just to note the changes from the default consumer group? Asking since we had an issue where it started replicating enormous topic; and based on the docs we didn't understand that it has non-default configuration.

@pdavidson100
Copy link
Contributor

Yes, that does make sense - we should update the the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants