Issue with Direct Migration from Glue Catalog to Hive Metastore #43

sbellary-chc · 2019-02-28T18:11:40Z

We have a Glue catalog in our dev aws account and now i am trying to migrate this Glue Catalog to Hive Metastore of an EMR Cluster (I need to do this to replace Hive Metastore content with Glue Catalog metadata so that I can track our Glue Catalog Data lineage using Apache Atlas installed on the EMR cluster).

I followed all the steps and procedures to Directly Migrate Glue Catalog to Hive Metastore but i am getting the Duplicate entry 'default' for key 'UNIQUE_DATABASE' error everytime and i have tried various iterations but still keep getting the same errors when i run the Glue ETL job. See the full error message below:

py4j.protocol.Py4JJavaError: An error occurred while calling o1025.jdbc.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 77.0 failed 4 times, most recent failure: Lost task 1.3 in stage 77.0 (TID 1298, ip-00-00-000-000.ec2.internal, executor 32): java.sql.BatchUpdateException: Duplicate entry 'default' for key 'UNIQUE_DATABASE'
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:642)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:783)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:783)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry 'default' for key 'UNIQUE_DATABASE'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2141)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1773)

So as per the migration procedure, we need to provide "--database-names" parameter values as key pairs so I tried specifying the list of databases separated by a semi-colon(;) first and also tried specifying just one database like "default" but none of this works. The above error was thrown when I used just the default database.

Is anyone familiar with this error? is there any work around to this issue? Please let me know if I am missing something. Any help is appreciated.

The text was updated successfully, but these errors were encountered:

har5havardhan · 2019-07-01T16:30:13Z

hi,
i am facing the same issue. did you manage to find any work around for this?

jainanuj07 · 2020-01-12T12:16:41Z

Hi I am facing same issue . Is anyone able to find the solution for this and able to successfully migrate from glue to hive mysql metastore.

daniloffantinato mentioned this issue May 5, 2020

Add option to put "database_prefix" when import is mode is "from_s3". #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Direct Migration from Glue Catalog to Hive Metastore #43

Issue with Direct Migration from Glue Catalog to Hive Metastore #43

sbellary-chc commented Feb 28, 2019 •

edited

har5havardhan commented Jul 1, 2019

jainanuj07 commented Jan 12, 2020

Issue with Direct Migration from Glue Catalog to Hive Metastore #43

Issue with Direct Migration from Glue Catalog to Hive Metastore #43

Comments

sbellary-chc commented Feb 28, 2019 • edited

har5havardhan commented Jul 1, 2019

jainanuj07 commented Jan 12, 2020

sbellary-chc commented Feb 28, 2019 •

edited