Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain limitation listed in the documentation #591

Open
j-santander opened this issue Oct 19, 2021 · 3 comments
Open

Explain limitation listed in the documentation #591

j-santander opened this issue Oct 19, 2021 · 3 comments

Comments

@j-santander
Copy link

j-santander commented Oct 19, 2021

I'm starting with Kafka Connect, Hadoop/HDFS and Kerberos... so, I'm probably missing some basic concepts.

In the published documentation (https://docs.confluent.io/kafka-connect-hdfs/current/overview.html#limitations), the following limitation is listed:

The HDFS 2 Sink connector does not permit you to run multiple instances of the connector in the same Kerberos environment.

It is unclear to me the meaning of this limitation. As written, my interpretation was that it is not possible to deploy on one Kafka Connect cluster (group of workers) more than one instance of the HDFS connectors if using Kerberos.

However, I've successfully created two working instances of the HDFS connector (both connected to the same kerberized HDFS and with the same connecting principal aka user), so I'm puzzled.

I guess there are many different pieces interacting here, so the limitation might lay out there....

We have:

  • Kafka Connect cluster: A set of nodes sharing the same configuration.
  • Kafka Connect Worker, node belonging to a cluster.
  • HDFS Sink Connector: An instance of the connector:
    • Deployed within a Kafka Connect cluster
    • Connecting with a Kerberos principal within a Kerberos realm.
    • Mapping a set of topics to an HDFS cluster (URL).
  • HDFS Cluster: A set of storage nodes.
    • Associated to a Kerberos realm.

So, as I said, what it is the use case that it is not possible?

Thanks very much in advance and please excuse is this a too basic question.

@kpatelatwork
Copy link
Member

I believe the limitation is "multiple Connectors to different Hadoop environments within a single Connect worker" and its due to #325

@j-santander
Copy link
Author

Thanks,

Let me write it in my own words.

Within one Kafka-Connect worker is not possible to set up connectors that connect to different Hadoop clusters.

Setting up multiple connectors is possible, as long as all of them connect to the same cluster.

Is that accurate enough?

Thanks again.

@kpatelatwork
Copy link
Member

I am new to this connector but last I know "Setting up multiple connectors is possible, as long as all of them connect to the same cluster and with the same Kerberos principal".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants