Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toDF() isn't working on the shell #1

Open
rush4ratio opened this issue Aug 16, 2017 · 9 comments
Open

toDF() isn't working on the shell #1

rush4ratio opened this issue Aug 16, 2017 · 9 comments

Comments

@rush4ratio
Copy link

rush4ratio commented Aug 16, 2017

I get the same error (see attached) when trying orgs.toDF().show() or memberships.select_fields(['organization_id']).toDF().distinct().show()
todf

@mohitsax
Copy link
Contributor

Thanks for using AWS Glue.

Please refer to the step 5 in AWS Glue documentation on using a REPL shell at: http://docs.aws.amazon.com/glue/latest/dg/tutorial-development-endpoint-repl.html
The solution to resolve this error is as follows – you would have to stop the existing SparkContext and create a new one using GlueContext.
spark.stop()
glueContext = GlueContext(SparkContext.getOrCreate())

If you have further questions, you can also use the AWS Glue Forum: https://forums.aws.amazon.com/forum.jspa?forumID=262

@rush4ratio
Copy link
Author

Thanks for the suggestion. I've tried it but unfortunately I still get the same error.

@mohitsax
Copy link
Contributor

Thanks for trying out the fix. We were not able to reproduce the error on a REPL shell after using the above fix.
Could you please open up a support ticket.

@Sergeant007
Copy link

The fix with spark.stop() worked for me. Let me also post the exact error message here for better indexing by search engines:

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/glue/metastore_db.

@yupinh
Copy link

yupinh commented Nov 30, 2017

One workaround would be to disable hive support when sparkContext is initialized.

newconf = sc._conf.set("spark.sql.catalogImplementation", "in-memory")
sc.stop()
sc = sc.getOrCreate(newconf)

Let me know if this causes you additional problems.

@laurikoobas
Copy link

The spark.stop() "fix" worked for me as well.
The specific error message was:

ERROR Schema: Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db'

@jgoeglein
Copy link

I ran into this as well. Why isn't the development environment setup to support this from the beginning? There's lots of glue documentation out there using .toDF() that doesn't work out of the box (the first example in https://github.com/aws-samples/aws-glue-samples/blob/master/FAQ_and_How_to.md for example)

@zalmane
Copy link

zalmane commented Sep 2, 2019

Just ran into this. The above did not work for me.
Ended up starting pyspark with the following flag:
./bin/gluepyspark --conf spark.sql.catalogImplementation=in-memory

@davehowell
Copy link

apache/spark@ac9c053 this is a spark patch that should fix this in the Spark 3.0.0 release

moomindani pushed a commit that referenced this issue Mar 2, 2023
Fixed Python library dependency on the Delta Lake example notebook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants