Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Install optional dependencies by default #405

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

arjxn-py
Copy link
Contributor

@arjxn-py arjxn-py commented May 7, 2024

This PR aims to give user the total control with pip install ragna i.e. which will correspond to current ragna[all] & also proposes a proof of concept for ragna-base which will give user the fine-grained control i.e. will further correspond to current ragna.
Related to #397


NOTE

  • This PR is still work in progress and potentially has much to be added.
  • Currently I'm trying to stick to these setuptools guidelines
  • Two packages like this can be achieved with two separate directories & pyproject.toml but that'll be a pain for maintenance, hence I'll try to achieve this in as minimal changes as possible
  • The related functions and methods will also be modified relatively once the core functionality is achieved.

@arjxn-py arjxn-py marked this pull request as draft May 7, 2024 02:16
@pmeier
Copy link
Member

pmeier commented May 7, 2024

Two packages like this can be achieved with two separate directories & pyproject.toml but that'll be a pain for maintenance, hence I'll try to achieve this in as minimal changes as possible

I don't think we can avoid that. Quoting from the setuptools documentation

Currently the following fields can be listed as dynamic: version, classifiers, description, entry-points, scripts, gui-scripts and readme.

Note that there is no mention of

ragna/pyproject.toml

Lines 8 to 9 in e397acb

[project]
name = "Ragna"

which we need to have Ragna and ragna-base.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 8, 2024

I don't think we can avoid that. Quoting from the setuptools documentation

I wonder if you're suggesting for a monorepo architecture, that is absolutely possible. So I guess we'll need to have two different directories i.e. ragna & ragna-base + add custom commands/workflows to install ragna/ragna-base, test both, and release both. But at this point I want to give this approach a little try first and keep monorepo as a final resort as I am positive that it'd solve the issue.

@pmeier
Copy link
Member

pmeier commented May 8, 2024

I wonder if you're suggesting for a monorepo architecture, that is absolutely possible. So I guess we'll need to have two different directories i.e. ragna & ragna-base

My idea would be to have a packaging directory in the project that holds all the configuration for ragna-base. The files in the root of the repository should correspond to ragna.

add custom commands/workflows to install ragna/ragna-base

The install should be fairly straight-forward. For a local development environemnt we run pip install -e ./packaging && pip install . in the project root. In CI, we can drop the -e.

test both

Could you elaborate what you want to test? Do you mean our test suite or something else?

If you mean the test suite, why do you want to test both? We should just require ragna to be installed and test ragna-base with all optional dependencies installed.

But at this point I want to give this approach a little try first and keep monorepo as a final resort as I am positive that it'd solve the issue.

Sure. If we have a clean way to avoid all of the above, I prefer that as well.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 9, 2024

Could you elaborate what you want to test? Do you mean our test suite or something else?

Yes, I was thinking that the test suite might break in case of ragna-base when some method dependent to optional dependency would be called. But since it is being suggested to run it with all optional dependencies installed for ragna-base as well, It'd not fail.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 9, 2024

But since it is being suggested to run it with all optional dependencies installed for ragna-base as well, It'd not fail.

Still, I'm not sure if this'd raise and import issue for optional dependency methods (if it'd we would need to add informative warnings and error and also test ragna-base without optional dependencies to prevent breaking unwanted methods & keep API stable.)

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 11, 2024

From these couple of days I've been trying to configure a concurrent setup.py that co-exist with pyproject.toml with almost same configurations. At the moment, the idea behind this is to have isolated builds with pyproject.toml & setup.py for ragna & ragna-base respectively, where setup.py also seems to be building the wheels for ragna-base for me locally with name ragna_base-0.0.0-py3-none-any.whl & desired base dependencies only, in the absence of pyproject.toml.
I am currently figuring out to have isolated builds in the presence of pyproject.toml, in case I do not manage to figure that out I believe it could be handled in the build workflow then.
I'd be really happy to have any thoughts on it. Thanks.

@pmeier
Copy link
Member

pmeier commented May 13, 2024

Depending on a setup.py is a non-starter. Right now we are using setuptools, but I don't want to lock us into that. If we depend on setup.py we cannot ever us a different build backend. Also note this footnote from the setuptools documentation

New projects are advised to avoid setup.py configurations (beyond the minimal stub) when custom scripting during the build is not necessary. Examples are kept in this document to help people interested in maintaining or contributing to existing packages that use setup.py. Note that you can still keep most of configuration declarative in setup.cfg or pyproject.toml and use setup.py only for the parts not supported in those files (e.g. C extensions)

Meaning, we could do it, but even the official maintainers are advising against it. The exception that they make is for building C extensions, but we are not doing that.

For this PR this means: if we cannot do it with just a pyproject.toml, we need to go with the approach I described in #405 (comment) or something along the lines of it. I'll ask around if someone has a different idea here, but I guess this is the best we can do right now.

@arjxn-py
Copy link
Contributor Author

Oh thanks Philip for making it clear in the start. Fortunately I got one more solution while scrolling web today (thanks to this comment) which looks much promising to me as compared to setup.py solution & monorepo architecture where we'd need to maintain almost double the codebase compared to now.
Plus I think this approach is pretty scalable too (in case we'd want more ragna-x package in future), just one more pyproject.toml & @nox.session. I'm pushing related changes in sometime.
Happy to have thoughts on it!

@pmeier
Copy link
Member

pmeier commented May 13, 2024

monorepo architecture where we'd need to maintain almost double the codebase compared to now.

I don't understand that. Basically, the only duplication would be the metadata in the pyproject.toml.

Note that the new ragna package is only a meta package. Meaning, it doesn't contain any code of its own, but only has some dependencies. So we don't duplicate anything but the metadata.

…while symlinking with any of them dynamically
@arjxn-py
Copy link
Contributor Author

With the current changes I'm able to build Ragna with nox -s build i.e. :

Ragna==0.1.0.dev199+gf9b8c39.d20240513110506
├── aiofiles [required: Any, installed: 23.2.1]
├── chromadb [required: >=0.4.13, installed: 0.5.0]
├── emoji [required: Any, installed: 2.11.1]
├── fastapi [required: Any, installed: 0.111.0]
├── httpx [required: Any, installed: 0.27.0]
├── httpx-sse [required: Any, installed: 0.4.0]
├── ijson [required: Any, installed: 3.2.3]
├── lancedb [required: >=0.2, installed: 0.6.13]
├── packaging [required: Any, installed: 24.0]
├── panel [required: ==1.3.8, installed: 1.3.8]
├── pyarrow [required: Any, installed: 15.0.0]
├── pydantic [required: >=2, installed: 2.7.1]
├── pydantic_core [required: Any, installed: 2.18.2]
├── pydantic-settings [required: >=2, installed: 2.2.1]
├── PyJWT [required: Any, installed: 2.8.0]
├── PyMuPDF [required: >=1.23.6, installed: 1.24.3]
├── python-docx [required: Any, installed: 1.1.2]
├── python-multipart [required: Any, installed: 0.0.9]
├── python-pptx [required: Any, installed: 0.6.23]
├── questionary [required: Any, installed: 2.0.1]
├── redis [required: Any, installed: 5.0.4]
├── rich [required: Any, installed: 13.7.1]
├── SQLAlchemy [required: >=2, installed: 2.0.30]
├── starlette [required: Any, installed: 0.37.2]
├── tiktoken [required: Any, installed: 0.6.0]
├── tomlkit [required: Any, installed: 0.12.5]
├── typer [required: Any, installed: 0.12.3]
└── uvicorn [required: Any, installed: 0.29.0]

And ragna-base with nox -s build-base i.e. :

ragna-base==0.1.0.dev199+gf9b8c39.d20240513105812
├── aiofiles [required: Any, installed: 23.2.1]
├── emoji [required: Any, installed: 2.11.1]
├── fastapi [required: Any, installed: 0.111.0]
├── httpx [required: Any, installed: 0.27.0]
├── packaging [required: Any, installed: 24.0]
├── panel [required: ==1.3.8, installed: 1.3.8]
├── pydantic [required: >=2, installed: 2.7.1]
├── pydantic_core [required: Any, installed: 2.18.2]
├── pydantic-settings [required: >=2, installed: 2.2.1]
├── PyJWT [required: Any, installed: 2.8.0]
├── python-multipart [required: Any, installed: 0.0.9]
├── questionary [required: Any, installed: 2.0.1]
├── redis [required: Any, installed: 5.0.4]
├── rich [required: Any, installed: 13.7.1]
├── SQLAlchemy [required: >=2, installed: 2.0.30]
├── starlette [required: Any, installed: 0.37.2]
├── tomlkit [required: Any, installed: 0.12.5]
├── typer [required: Any, installed: 0.12.3]
└── uvicorn [required: Any, installed: 0.29.0]

@arjxn-py
Copy link
Contributor Author

We can also get rid of requirements.txt & requirements-base.txt as I can specify the dependencies in pyprojects or noxfile directly.

@arjxn-py
Copy link
Contributor Author

I don't understand that. Basically, the only duplication would be the metadata in the pyproject.toml.

Note that the new ragna package is only a meta package. Meaning, it doesn't contain any code of its own, but only has some dependencies. So we don't duplicate anything but the metadata.

I see. Sorry, with the comment above I misunderstood that we'd had to have ragna/ inside packaging/ which would have acted as a separate src for ragna-base

@agriyakhetarpal
Copy link
Member

Hi, @arjxn-py. I had another idea about how to go about doing this, so I am posting this comment as requested by @pmeier. I would say that having the changes in noxfile.py are great, but it also adds an extra dependency on nox switching between the pyproject.toml files (might not be a problem though), plus, you can quite easily construct your own minimal CLI-based script to do this rather than relying on nox. Using nox will create redundant virtual environments with the invocation of each session unless you specify the venv_backend to be None.

Even though the setuptools documentation might not recommend setup.py for pure Python projects, having a minimal, shim-like setup.py file with most of the metadata still present inside pyproject.toml, is a reasonable compromise. This way, I can imagine we can drop adding two separate pyproject.toml files and function with one setup.py + one pyproject.toml.

The dynamic dependencies currently noted in pyproject.toml (requirements.txt) can be read and delimited to be used as metadata in setup.py itself, i.e., what I propose is of the following form:

import os
from setuptools import setup

with open('requirements.txt') as f:
    ragna_dependencies = f.read().splitlines()

with open('requirements-base.txt') as f:
    base_dependencies = f.read().splitlines()

name = 'ragna-base' if os.environ.get("BUILD_RAGNA_BASE") else 'ragna'
dependencies = base_dependencies if os.environ.get("BUILD_RAGNA_BASE") else ragna_dependencies

setup(
    # ...
    name=name,
    version='',
    packages=['ragna'],  # or find_packages() with include= and exclude= attributes
    install_requires=dependencies,
    # and so on
)

while the rest of the project metadata stays in pyproject.toml, because builders will construct and collate PEP 621 metadata from both of these files (and setup.cfg as well, but that's not being added here).

This is where you can switch between particular requirements at the time of running the build frontend (python -m build) to build the source distribution and the wheel with this environment variable (I don't recommend using --config-settings because setuptools does not seem to handle them very well in comparison to other build backends). This approach might require some changes to test_importable.py and the package structure with the use of setuptools.find_packages(), so that both ragna and ragna-base do not break on import ragna respectively (I am assuming as someone who hasn't used ragna before that both ragna and ragna-base will provide their public API under ragna and a separate ragna_base namespace is not being proposed). It is to be noted that this approach does restrict the build backend to setuptools and might not be useful if ragna were to switch to something like hatchling sometime later.


P.S. Sorry, I meant to post this yesterday – I didn't hit the "Comment" button and this comment stayed as a draft, oops! Now that I think of it, to avoid using setup.py you can also use the toml package and "construct" your pyproject.toml file through it with a helper script (similar to the noxfile.py you added, but with PEP 723 style metadata or something), i.e., dropping a particular TOML table or adding items to it at the time of building the base package. I would recommend modifying the importable GitHub Actions job in order to verify that it passes for both ragna and ragna-base, in any case (you can either use a matrix or split into two jobs).

P.P.S. Another option is to look at how fastapi and typer projects do it with their PDM backend – however, their implementation has not been published as a separate project and remains restricted to internal use cases, and adapting to that could take a bit of extra effort.

@arjxn-py
Copy link
Contributor Author

Hey @agriyakhetarpal, thanks for such a descriptive comment, much helpful. I'd be happy to integrate the suggested changes and would potentially get rid of nox.

P.P.S. Another option is to look at how fastapi and typer projects do it with their PDM backend – however, their implementation has not been published as a separate project and remains restricted to internal use cases, and adapting to that could take a bit of extra effort.

Just to let you know, I've tried to figure out what they've been doing and not been able to get more information about the significant of environment variable they're using and how they're handling it.

@arjxn-py
Copy link
Contributor Author

arjxn-py commented May 22, 2024

For current changes Build for Ragna package & Build for ragna-base package both builds for ragna in spite of me trying to overwrite the name and dependencies in shim setup.py

In next commit I'll compare it to the approach using nox

@arjxn-py
Copy link
Contributor Author

With the nox, the build seems pretty fine to me here are the trees for Ragna & ragna-base

Note that I'd suggest to not add nox as a dependency as we can resolve that in workflow only because of it's minor usage, plus we can also recommend users to use the traditional python -m build while building locally to avoid them creating nox virtual environments (in the meantime I'm also looking into disable the creation of venv by setting venv_backend to none)

And if we want to further get rid of the noxfile, i'd be happy to open a separate issue about that to dig into it further.

But overall we've one proof of concept ready :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants