Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[K8S][HELM] Add Hadoop configuration support #6046

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dnskr
Copy link
Contributor

@dnskr dnskr commented Feb 4, 2024

🔍 Description

Issue References 🔗

The PR adds basic support for Hadoop configuration files to be used by Apache Kyuubi, Apache Spark etc.

Describe Your Solution 🔧

The PR adds:

  • Apache Hadoop related ConfigMap that has to be mounted to Kyuubi pods as files.
  • HADOOP_CONF_DIR environment variable to Kyuubi pods.

Notes:

  • The ConfigMap change restarts Kyuubi pods!
  • Only core-site.xml and hdfs-site.xml files are added. Not sure yarn-site.xml, mapred-site.xml or other files should be added. Please, let me know if so.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Install the chart with configured Spark and the following values:

# Hadoop configuration files
hadoopConf:
  # The value (templated string) is used for core-site.xml file
  # See https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml and Hadoop documentation for more details
  coreSite: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>test.core.site</name>
        <value>test-core-site</value>
      </property>
    </configuration>

  # The value (templated string) is used for hdfs-site.xml file
  # See https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml and Hadoop documentation for more details
  hdfsSite: |
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>test.hdfs.site</name>
        <value>test-hdfs-site</value>
      </property>
    </configuration>

Run beeline from Kyuubi pod:

./bin/beeline -u 'jdbc:hive2://kyuubi-thrift-binary:10009' -n test-user

Check specified property values:

0: jdbc:hive2://kyuubi-thrift-binary:10009> set test.core.site;
+-----------------+-----------------+
|       key       |      value      |
+-----------------+-----------------+
| test.core.site  | test-core-site  |
+-----------------+-----------------+


0: jdbc:hive2://kyuubi-thrift-binary:10009> set test.hdfs.site;
+-----------------+-----------------+
|       key       |      value      |
+-----------------+-----------------+
| test.hdfs.site  | test-hdfs-site  |
+-----------------+-----------------+

Checklist 📝

Be nice. Be informative.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f67140e) 61.03% compared to head (1d66b3e) 60.99%.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6046      +/-   ##
============================================
- Coverage     61.03%   60.99%   -0.05%     
  Complexity       23       23              
============================================
  Files           623      623              
  Lines         37162    37162              
  Branches       5035     5035              
============================================
- Hits          22682    22666      -16     
- Misses        12029    12033       +4     
- Partials       2451     2463      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

{{- tpl . $ | nindent 4 }}
{{- end }}
{{- with .Values.hadoopConf.hdfsSite }}
hdfs-site.xml: |
Copy link
Member

@pan3793 pan3793 Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's likely to put additional configuration files under HADOOP_CONF_DIR in practice, for example, hive-site.xml, hbase-site.xml, so we may want to allow the user to inject custom files.

Another concern is that embeding large-XML into YAML may be not user-friendly, it's easy to mess up the indent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree. The approach I often see is providing both options: either provide a (templated) inline config - which as you said can get unwieldy fast - or provide the name of an existing config map. The latter allows creating that config map from a file or some other way, without relying on functionality in the chart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chgl are there examples of some projects that have similar cases?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants