Skip to content
This repository has been archived by the owner on Apr 27, 2022. It is now read-only.

PySpark cannot find Python #3

Open
jest opened this issue Nov 19, 2019 · 2 comments
Open

PySpark cannot find Python #3

jest opened this issue Nov 19, 2019 · 2 comments

Comments

@jest
Copy link

jest commented Nov 19, 2019

python3 APK installs only /usr/bin/python3 binary, but by default PySpark searches for python binary in PATH. This results in kernel error when enabling Spark in a notebook:

2019-11-19T10:55:15.696952916Z /usr/bin/find-spark-home: line 40: python: command not found
2019-11-19T10:55:15.697407078Z /usr/bin/spark-submit: line 27: /bin/spark-class: No such file or directory

I see two solutions to this problem. In Dockerfile:

  1. Either link python to python3:
    RUN cd /usr/bin && ln -s python3 python
    
  2. or set PYSPARK_PYTHON variable (not tested, as for https://stackoverflow.com/questions/30279783/apache-spark-how-to-use-pyspark-with-python-3):
    ENV PYSPARK_PYTHON python3
    
@jest
Copy link
Author

jest commented Nov 24, 2019

I also tested solution 2. and can confirm it works

@Vilos92
Copy link
Owner

Vilos92 commented Dec 31, 2019

Hi @jest ,

Thanks for your suggestions on this and the other issue, and sorry for the slow response! Things have been quite busy for me since late November, and I hadn't been checking out this repo.

Both of your suggestions seem great, and I'll look to incorporate them later this week. Hope you have a Happy New Years! 🎊

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants