Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify instructions of Kedro with venv and Pipenv #2360

Closed
astrojuanlu opened this issue Feb 24, 2023 · 5 comments
Closed

Clarify instructions of Kedro with venv and Pipenv #2360

astrojuanlu opened this issue Feb 24, 2023 · 5 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation

Comments

@astrojuanlu
Copy link
Member

Description

I read in the documentation (emphasis mine):

Create a directory for working with Kedro within your virtual environment:
mkdir kedro-environment && cd kedro-environment
Next, to create a new virtual environment in this directory, run:
python -m venv env/kedro-environment # macOS / Linux

It's not at all clear to me what's happening there. If I take these steps literally I'd end up with a venv installed in kedro-environment/env/kedro-environment, which seems repetitive an unusual.

There might be adjustments needed for Pipenv as well, but given that the ecosystem of Python workflow tools is right now expanding rather than consolidating and that Pipenv is the least likely of them to "win", I have reservations about even having it mentioned.

Context

The confusion arising when using Kedro with in-tree (local) virtual environments has been discussed some time ago here #681 and it looks like it's difficult to resolve because of limitations in cookiecutter. Therefore, it would be good if these instructions were more up to date.

Tangentially related to the request for Poetry support #1722

@stichbury
Copy link
Contributor

I think removing the words "...within your virtual environment" would be a satisfactory tweak to the ambiguity.

@astrojuanlu
Copy link
Member Author

astrojuanlu commented Feb 27, 2023

That would be a good first step, but it's not clear at which stage or from which directory the user should run kedro new. That's the bootstrapping problem that was mentioned in gh-681:

juan_cano@M-PH9T4K3P3C /tmp> mkdir testdir  # Create the project directory
juan_cano@M-PH9T4K3P3C /tmp> python3 -m venv testdir/.venv  # Install the venv inside
juan_cano@M-PH9T4K3P3C /tmp> source testdir/.venv/bin/activate.fish  # Activate
(.venv) juan_cano@M-PH9T4K3P3C /tmp> pip install kedro &>/dev/null  # Install Kedro
(.venv) juan_cano@M-PH9T4K3P3C /tmp> kedro new --verbose  # Cannot create a Kedro project

Project Name
============
Please enter a human readable name for your new project.
Spaces, hyphens, and underscores are allowed.
 [New Kedro Project]: testdir
Traceback (most recent call last):
  File "/private/tmp/testdir/.venv/lib/python3.9/site-packages/kedro/framework/cli/starters.py", line 351, in _create_project
    result_path = cookiecutter(template=template_path, **cookiecutter_args)
  File "/private/tmp/testdir/.venv/lib/python3.9/site-packages/cookiecutter/main.py", line 114, in cookiecutter
    result = generate_files(
  File "/private/tmp/testdir/.venv/lib/python3.9/site-packages/cookiecutter/generate.py", line 291, in generate_files
    project_dir, output_directory_created = render_and_create_dir(
  File "/private/tmp/testdir/.venv/lib/python3.9/site-packages/cookiecutter/generate.py", line 223, in render_and_create_dir
    raise OutputDirExistsException(msg)
cookiecutter.exceptions.OutputDirExistsException: Error: "/private/tmp/testdir" directory already exists

We could advise users to create the venv in a parent directory, then run kedro new inside:

juan_cano@M-PH9T4K3P3C /tmp> mkdir projectdir && cd projectdir  # Create parent directory *and* cd
juan_cano@M-PH9T4K3P3C /t/projectdir> python3 -m venv .venv  # Create venv in it
juan_cano@M-PH9T4K3P3C /t/projectdir> source .venv/bin/activate.fish  # Activate
(.venv) juan_cano@M-PH9T4K3P3C /t/projectdir> pip install kedro &>/dev/null  # Install kedro
(.venv) juan_cano@M-PH9T4K3P3C /t/projectdir> kedro new

Project Name
============
Please enter a human readable name for your new project.
Spaces, hyphens, and underscores are allowed.
 [New Kedro Project]: proj

The project name 'proj' has been applied to: 
...
(.venv) juan_cano@M-PH9T4K3P3C /t/projectdir> ls proj
README.md       data/           logs/           pyproject.toml  src/
conf/           docs/           notebooks/      setup.cfg

@stichbury
Copy link
Contributor

I went back and found your previous comments about this section of the docs for good measure, so we can make all the changes at once.

"We suggest you create a new Python virtual environment for each new Kedro project you work on to isolate its dependencies from those of other projects." It's still unclear to me whether I should install kedro globally (with a pipx of sorts) or if I have to install it in the virtual environment of the project 

  • But, in the latter case, there's a chicken-and-egg problem: the directory is created by kedro new, and I might want to create the .venv there- There's a "It is also possible to install Kedro using conda install -c conda-forge kedro" right after explaining conda and then pip 

@stichbury
Copy link
Contributor

I have removed this whole thing temporarily because I've removed the FAQ page. I think this text should be reinstated on the main install page in the virtual environment manager section and rewritten as you guide above. So let's prioritise getting this ticket into an upcoming release to pair it with #1985 and #2261 onboarding enhancements.

@astrojuanlu
Copy link
Member Author

Follow up: #3281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation
Projects
Status: Done
Development

No branches or pull requests

2 participants