Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hive_metastore_migration.py fails with AttributeError: 'str' object has no attribute '_jdf' #120

Open
jobennin opened this issue Apr 14, 2022 · 1 comment

Comments

@jobennin
Copy link

Testing the HMS migration script with spark-submit command fails with:
AttributeError: 'str' object has no attribute '_jdf'

which is triggered by the call:
id_type = df.get_schema_type(id_col)

If I change the call to:
id_type = get_schema_type(df, id_col)
I get past the error but expose other df related errors in other functions.

This is tested on:

"emr-5.31.0"
"Hadoop":"2.10.0"
"Hive":"2.3.7"
"Spark":"2.4.6"

Full stack trace:
Traceback (most recent call last):
File "/home/hadoop/hive_metastore_migration.py", line 1525, in
main()
File "/home/hadoop/hive_metastore_migration.py", line 1519, in main
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
.transform(hive_metastore)
.transform(hive_metastore)
.transform(hive_metastore)
File "/home/hadoop/hive_metastore_migration.py", line 753, in transform
ms_database_params=hive_metastore.ms_database_params)
File "/home/hadoop/hive_metastore_migration.py", line 734, in transform_databases
dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/home/hadoop/hive_metastore_migration.py", line 336, in join_with_params
df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/home/hadoop/hive_metastore_migration.py", line 314, in transform_params
return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/home/hadoop/hive_metastore_migration.py", line 326, in kv_pair_to_map
id_type = df.get_schema_type(id_col)
File "/home/hadoop/hive_metastore_migration.py", line 199, in get_schema_type
return df.select(column_name).schema.fields[0].dataType
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1327, in select
AttributeError: 'str' object has no attribute '_jdf'

I have also tried with EMR v6.5 with Spark v3.1.2. Same error. I thought it might be Spark version issue.
What Spark version has this script been successful with? EMR version?
I launch the spark-submit per the readme with the --jdbc* options changed as needed.

@Dearkano
Copy link

same issue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants