Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APE 22: Public API #85

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

APE 22: Public API #85

wants to merge 10 commits into from

Conversation

nstarman
Copy link
Member

@nstarman nstarman commented Apr 28, 2023

Up for discussion!

Very much a work in progress. Hopefully refined by discussion at the upcoming conference.

Signed-off-by: nstarman <nathanielstarkman@gmail.com>
Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments/questions so far - one of the main things that bothers me in general and that I don't think there is a solution to is that I can do e.g.:

from astropy.cosmology.realizations import ScienceState

Obviously this isn't public API, and ScienceState isn't in __all__, but there's not real way to prevent users from relying on it, and we can't rename all imports in a module to include a _ prefix. So I think that we should probably also make it so that our API docs are also an authoritative source of public API and ensure that it's consistent with the other rules. We should also make sure that all public APIs are documented in the docs (we don't check that this is the case right now).

APE_public.rst Outdated
public API." SciPy allows modules to lack an ``__all__`` attribute, meaning a
user and their tools must understand the nuances of the previous rules. Having
an ``__all__`` attribute in every module is simpler, unambiguous, and better for
introspection by both users and automated systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth mentioning at this point how a user would know that a function/class is in __all__. Is it not possible to explicitly import functions/classes in a module that are not in __all__? If so then having something in __all__ does not stop e.g. CoPilot from suggesting code that uses a function not in __all__.

APE_public.rst Outdated
2. All modules must have an ``__all__`` attribute, even if it is empty. The
``__all__`` attribute defines the public and private interface of the module
in which it is defined. Anything in ``__all__`` is public, including
underscore-prefixed symbols. Anything not in ``__all__`` is private in that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I think we should probably disallow underscore-prefixed symbols in __all__?

Copy link
Member Author

@nstarman nstarman May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that practically there should not, but I strongly think __all__ should be definitive, so if we have an underscore-prefixed object that we want to make public the mandatory steps are (in this order):

  1. put it in __all__.
  2. update the docs to reflect __all__.
  3. super strongly encouraged to remove the underscore prefix (updating steps 1 & 2).

This is adopting the Scipy disambiguation of PEP 8, adding primarily the mandate that empty __all__ be included in modules with no Public API.

APE_public.rst Outdated Show resolved Hide resolved
APE_public.rst Outdated Show resolved Hide resolved
APE_public.rst Outdated
- Clearly state if a documented object is actually private.
3. **Add prefixes**: 1. Add prefixes to all modules that are not public. 2. Add
prefixes to all classes, functions, and attributes that are not public.
:note:`I'm less enthusiastic on this point.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this as long as - as described above - we don't need to add underscore to symbols that are already in an underscore module for instance (so e.g. nothing in astropy.io.fits._tiled_compression needs to have an underscore prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that. I have reservations about going all in on underscore prefixing everything. Not needing to underscore symbols in private modules (which still have an __all__) SGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog, resolve?

@astrofrog
Copy link
Member

astrofrog commented Apr 28, 2023

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what __all__ is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

@saimn
Copy link

saimn commented Apr 29, 2023

The problem with __all__ is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, python/peps@7dba60c). So it doesn't prevent importing from a module, and autocompletion doesn't use it.
So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

@saimn
Copy link

saimn commented Apr 29, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@astrofrog
Copy link
Member

astrofrog commented Apr 29, 2023

Maybe in that case .core should technically be private? (._core)

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

Maybe in that case .core should technically be private? (._core)

👍. That is the suggestion of PEP 8 -- and that .core should also have a blank __all__ = [].

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

So this would be one of the largest changes to Astropy. PEP8 and thus Scipy and static typing all say that __all__ refers to what is public in that module. With this in mind, doing

# __init__.py
# No __all__ is defined
from .core import *
# core.py
__all__ = ["Foo"]

class Foo: ...

Means that Foo is public in astropy.convolution.core and private in astropy.convolution. This is contrary to how Astropy intends, where we are saying __all__in astropy.convolution.core means that it is actually private in astropy.convolution.core and public in astropy.convolution. This is confusing for many reasons. If we were to adopt PEP8 (as suggested in this draft APE) then the previous example would look like

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined elsewhere.
from .core import Foo
# core.py
__all__ = []  # nothing is public in this module. Please look elsewhere.

class Foo: ...

Essentially we need to move the contents of various __all__ to where the code is actually public, leaving behind empty __all__ to indicate where no code is public.
Caveat private modules defining non-empty __all__ is fine. This enables *-imports in the public modules.
Thanks @astrofrog for the clarification, which is now detailed in the APE.

Update

The better option is to rename core.py to _core.py and add an __all__ to __init__ like so.

# __init__.py
from . import _core, ...
from ._core import *
...

__all__ = [] + _core.__all__ + ...
# _core.py (formerly core.py)
__all__ = ["Foo"]

class Foo: ...

This still supports * imports if you want them and retains 100% unambiguity about what is public and where.

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

I think Yes. To make sure we're on the same page, I think communicating it this way to users should be the logical consequence of the deeper rules:

  • That __all__ is authoritative
  • That we follow PEP 8 public versus internal interfaces, e.g. with undercore prefixes
  • That the docs are up-to-date.

Having all these rules means that a user can only get to public symbols though other public symbols -- that "anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere" and "anything in the API docs" (unless explicitly stated) is public. It's not the source or our definition of public API, it's the consequence.

@nstarman
Copy link
Member Author

nstarman commented Apr 30, 2023

The problem with all is that it only affects import * (and for a long time it was it's only meaning, the pep8 section about public/private interface was added later, python/peps@7dba60c). So it doesn't prevent importing from a module, and autocompletion doesn't use it.
So the only way to enforce public/private API is by renaming private modules with an underscore. That's what Scipy did.

I agree, __all__ is not enough to prevent autocomplete, though some autocomplete does use __all__: see https://ipython.readthedocs.io/en/stable/config/options/terminal.html#configtrait-IPCompleter.limit_to__all__.
I also agree we should rename modules with an underscore, as part of adhering to PEP 8.
It should be noted that adding underscores does not actually "enforce" public/private API as Python does not have true language-level features for public vs internal interfaces. Like __all__, single underscores are convention and Scipy says that __all__ takes precedence over underscores. In this APE I propose that we adopt a Scipy-like rule set where __all__ takes precedence over underscores and we use both according to PEP 8.

@saimn
Copy link

saimn commented May 1, 2023

@astrofrog - Maybe in that case .core should technically be private? (._core)

If we want to control strictly what's public and private yes, basically all submodules should be private (renamed with underscore) and public API exported in the subpackages' __init__.py. That's what Scipy did.

@nstarman - So this would be one of the largest changes to Astropy.

With this solution yes, this would require a lot of changes, and may be painful to do. So I don't like this solution.
But you also seem to agree with renaming with underscores, though I think those are two different solutions.

To summarize:

  • Option 1: use __all__ = [] in submodules, and import explicitly public functions/classes in subpackages' __init__.py and list those in __all__.
  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I don't like option 1 because it moves the list of exported functions from the module itself to another place, and requires a lot of changes which can be prone to errors. Option 2 is more reasonable.

Then as you say, the underscore prefix is also just a convention, but that's the closest thing to a private scope in Python's land. And autocomplete respect it, so when users browse the functions in their shell they will see only public API. As a users I never checked the content of __all__ of a package, I use autocompletion and the docs.

[1] They also kept module.py with deprecation warnings because there is a lot of code using import from e.g. scipy.ndimage.morphology instead of scipy.ndimage. We may want to do that in specific cases, but I don't think we would need it to do that in a systemic way.

@pllim pllim changed the title Public API APE 22: Public API May 2, 2023
@eteq
Copy link
Member

eteq commented May 2, 2023

(Some of this developed from discussion at the coordination meeting (including @nstarman, @saimn , @astrofrog , @tepickering, @pllim, @nden, @WilliamJamieson), although I don't think I can say that all of my points above are consensus of those folks, it's some mix of that and just straight up my own opinion.)

I very much agree with @astrofrog's point here:

After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what all is, so when we communicate with a user, should we tell them that public API is anything in the API docs, or e.g. anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere?

Which I think has up until this point (in an uncodified way) is that whatever the docs say is the public API. So I think it makes sense to codify that as the "true answer". @nstarman's point was that if we follow the rules here, it's the same between those, and that it's only an aberration if these are different. But I was/am concerned about the inevitable state when something isn't working right. So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

I still personally think we should have it true that the final source of authority is the documentation, because that's more user-facing of a contract, as @astrofrog says. But I think if we say that's the reality now, and we might re-visit it after this APEs plan is implemented, that's a reasonable compromise.

Two more opinions to offer:

  • A consequence of this is that modules like astropy/quantity/quantity.py become astropy/quantity/_quantity.py. I think leading underscore module names are ugly. That's subjective, but still.
  • Another consequence is that the actual public API __all__ s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/__init__.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which __all__ to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

And one question:

  • Does this apply to coordinated packages in addition to the core? In principle it should, but that might be signing up for a lot more work because they are probably more of a mess than the core...

@pllim
Copy link
Member

pllim commented May 3, 2023

Does this apply to coordinated packages

I would say not. Even for things like removing astropy-helpers, it was a tedious campaign with writing up a transition guide and opening some PRs downstream. For things like this that would break API, it is a non-starter.

Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman nstarman requested a review from astrofrog July 31, 2023 00:44
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman
Copy link
Member Author

@astrofrog @eteq @saimn @pllim, I've updated this APE based on the discussions we had at the conference an in this thread. LMK what you think!

Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@nstarman
Copy link
Member Author

nstarman commented Aug 1, 2023

What do you propose for submodules that currently define __all__ but from which people should not import directly, e.g. astropy.convolution.core (and many others). This is the scheme that is mostly used currently in Astropy.

@saimn I've been working on this over at astropy.cosmology. We've successfully transitioned .utils -> ._utils, io -> ._io, and I'm working on the rest. The code is clearer from a user's perspective since there's only one obvious place to import thing from and all the hidden modules aren't tab-completion discoverable.

@nstarman
Copy link
Member Author

nstarman commented Aug 1, 2023

  • Option 2: rename submodules with underscore, keep their list of public functions/classes in __all__ (a lot of them already have it) and just change the import in subpackages' __init__.py (from .module import *from ._module import *). That's what Scipy did. [1]

I also like Option 2 a lot. It works because _module is not made public in __init__, so even though _module defines an __all__ and makes it's contents locally / contextually public that is within a private module. It's impossible to publicly navigate to the contents of _module, only what is exported to __init__.

With this as the template, the example from #85 (comment) becomes

# __init__.py
__all__ = ["Foo"]  # Foo is public in this module, even if it is defined in `_core`, which is private.
from ._core import Foo
# _core.py
__all__ = ["Foo"]  # Foo is "public" in this module, but this module is private.

class Foo: ...

Another consequence is that the actual public API all s are in different files than the thing-to-be-made-public E.g., Quantity would be in astropy/quantity/init.py instead of astropy/quantity/_quantity.py. I don't like that because it means a small change in one file requires one to understand the full API structure to know which all to add it to. I'm not sure that's annoying enough to justify changing anything, but it's a complaint I want to register and think about how we might get around it.

@eteq, I believe @astrofrog's comment largely answers this question.

@nstarman
Copy link
Member Author

nstarman commented Aug 1, 2023

So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules.

@eteq, I agree.

But I was/am concerned about the inevitable state when something isn't working right.

I added a section on a pre-commit CI check. It actually looks to be fairly simple to check that a public module has corresponding documentation since we have docs/api that collects our documented objects. I believe we can go further and make a two-way check to also check that something in docs/api is also in __all__.
Given all this, we can make it 🤞 impossible for the docs to not reflect the public API as defined in the code.

Copy link
Member Author

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments.

not have a uniform and systematic approach to communicating what is public vs
internal, then we cannot expect users and especially their tools to know what is
public vs internal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From astropy/astropy#15169, I recall another source of issues that has bitten astropy in the past: I/O registration. When deep sub-modules look public, e.g. astropy.cosmology.flrw.lambdacdm.LambdaCDM then that is generally used as the class pathway in I/O rather than the better astropy.cosmology.LambdaCDM. When the class is moved it breaks the I/O and needs a whole backwards-compatibility logic to support files using the previous path. Cleaning up public vs internal, e.g. to astropy.cosmology._flrw.lambdacdm.LambdaCDM would make this an obviously bad way to serialize the class and code authors would get it "right" (astropy.cosmology.LambdaCDM) the first time. Clear public API makes for more stable code.

probably deserves a few sentences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly orthogonal to the issue here: really, one needs to set __module__ to avoid this - registration just followed that, not a choice made by coders. Note that numpy in fact does this - which partially is annoying, since one no longer knows where the function is defined.

Copy link
Member Author

@nstarman nstarman Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the pain point was YAML serialization because I used f"{class.__module__}.{class.__qualname__}", which looks reasonable as code, but led to the problems described above. If, when making the tests, I had noticed a private path I would have fixed the problem from the start.

I believe this applies more generally, if something is defined in a private module but imported to a public location then it may be imported from there again, e.g. as part of I/O. Serializing using only the public import location is much better than what I did, which seemed reasonable at the time. Setting __module__ is one way to force public locations in serialization (which I agree has issues); a path map, or str regex / replace operation are other means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that one is an issue for examples and sphinx too - I really dislike that when we made representations a module (which was definitely a good idea!), we had to make so many updates to the docs. It would also be nice to use regex for things like __construct_mixin_classes

For me, this is still separate from the APE, since this has annoyed me amply already! But I can see that coming from an environment where files with leading underscores have more meaning, you could have avoided the issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we can leave it out of this APE. I think the point I made "If, when making the tests, I had noticed a private path I would have fixed the problem from the start." means adopting this APE will automatically force most of these I/O issues to be fixed

APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
@nstarman nstarman reopened this Aug 19, 2023
@astrofrog
Copy link
Member

So just to put a radically different idea out there, at least to consider as an alternative, what if we considered that public API was defined solely as objects defined in __all__ and only in a subset of the __init__.py files - we could then add something to the effect of:

warnings.warn('This module is private API and should not be relied on', PrivateAPIWarning)

at the top of most files in the code base, while in __init__.py files defining any public API we would not emit this warning and instead do e.g.:

# This file defines public API

__all__ = ['SomeClass', 'a_public_utility']

with ignore_private_api_warnings():
    from .core import SomeClass
    from .utils import a_public_utility

The advantage of this approach is that

  • we wouldn't need to worry about cluttering the code base with underscores
  • we wouldn't need to break things for users at any point, but they would know they are using API that can break at anytime (in a sense it is a bit like a permanent deprecation phase)
  • we wouldn't need to have a deprecation phase where we need to have both modules with underscores and compatibility ones without underscores, which will temporarily cause a lot of clutter in the code base
  • this approach would arguably be more explicit and easier to learn for contributors rather than learning the rules about underscores and so on
  • permanent warnings for users relying on private API

We could likely use pre-commit to ensure that modules either include a comment such as the one above (This module defines public API) or include a PrivateAPIWarning.

The main disadvantages I can see with this approach are:

  • a potential small performance impact related to emitting and catching warnings at import time, but this may not be significant, and would have to be assessed.
  • this would be a different approach to other packages in the scientific python ecosystem

I'm not strongly attached to this idea, but I do think it merits consideration.

@mhvk
Copy link
Contributor

mhvk commented Aug 21, 2023

@astrofrog - I certainly like the simplicity and the much smaller amount of work involved!

An even less intrusive version would of course be to have warning in the module docstring. But the advantage of your version is that the warning could include the suggestion to raise an issue if a user believes the piece of code they are importing should really be part of the public API.

If we edit docstrings, one advantage is that it would show up in sphinx, for those modules that we have typeset just because it keeps the organization more logical (e.g., https://docs.astropy.org/en/latest/units/ref_api.html#astropy-units-equivalencies-module). Obviously, though, this does not preclude adding the warning! (Or, indeed, having module names starting with an underscore)

@pllim
Copy link
Member

pllim commented Aug 21, 2023

I am a little 👎 on module level warning. It is going to be a pain to filter it out everywhere internally.

@astrofrog
Copy link
Member

I am a little 👎 on module level warning. It is going to be a pain to filter it out everywhere internally.

We wouldn't need to filter warnings everywhere, just in the few __init__.py files that would contain public API, because then at the end of the day, whatever the user imports from the public API, any private API warning would be hidden.

If tests test non-public API then they would see warnings, but arguably we could even simply ignore the private API warnings globally for all tests in the pytest config in setup.cfg.

@WilliamJamieson
Copy link

I am a little 👎 on module level warning. It is going to be a pain to filter it out everywhere internally.

I am with @pllim with this. I would rather see us follow the same direction as numpy and scipy by prefixing modules and subpackages where appropriate to separate out the private API (there is also a clear path here for formally deprecating things too). Adding a warning like this will make it so that we are constantly having to catch that warning internally when accessing the private API. E.G. I create a _utils module to collect some useful utilities for a given subpackage that are used throughout, but should not be public, this would mean I would have to catch the warning every time I used something from _utils because there would be no reason to ever add its functionality to an __init__.py.

@astrofrog
Copy link
Member

this would mean I would have to catch the warning every time I used something from _utils because there would be no reason to ever add its functionality to an init.py.

I'm happy to also follow the numpy/scipy route but I just want to answer the above statement - it isn't correct that you would have to filter the warning everywhere internally. It doesn't matter what private methods/functions call what private methods/functions, at the end of the day all a user will do is import public API and we just need to filter the warning in the __init__.py file exposing that public API. Everything that happens internally is irrelevant. The only exception to this is if functions/methods dynamically import private modules at run-time rather than import time (which arguably we should minimize the use of).

@nstarman
Copy link
Member Author

nstarman commented Aug 24, 2023

I do think it merits consideration.

It's good to circle back to this, as it was where this proposal started before the Astropy conference!
I guess it's a sign of how complex this issue can be. Were we to start writing Astropy from scratch today, I don't believe we'd be having this debate as we'd follow PEP-8 and typing practices and protect user's tab-autocomplete, etc. The issue, is that we are not starting tabula rasa and given our current muddled situation, how do we proceed? I guess philosophically I'm always inclined to go where we would if starting tabular rasa, and to try to chart a path of minimal pain, even if there is some pain.

Here are some of the reasons the proposal evolved from the conference.

what if we considered that public API was defined solely as objects defined in __all__

That is the central gist of this APE. __all__ is the "ground truth" from which we derive our API.
However, while __all__ has the immense benefits of being very explicit and instrospectable by developers, it is severely lacking for the needs of users. Users have 2 primary means to discover functionality within a library: by external channels — documentation, examples, tutorials, word of mouth — and by inspecting the code, most easily accomplished by tab autocomplete. I don't think it controversial to say that many of us tab-filter in IDEs to find a relevant function or even module. One notable issue with __all__ is that since it starts with an underscore it is not found by tab completion. And even if it were, how many would consult it when tab-autocomplete gives a full list of available objects.

So while for developers __all__ is excellent, for users we need to make their means of finding and understanding public API match __all__. How do we do that? I'm happy we agree on the documentation, that we can create CI checks that ensure __all__ matches the docs. But what about simple exploration of the library? Is astropy.units.quantity public? To a user it very reasonably appears to be. Like us, numpy and scipy have this problem in spades (numpy.core is ostensibly private, but if it's so visibly public, is it really private?).

Using __all__ is important. It's step 2 of this APE! But alone it is a half measure. Warnings is one way to try to fix this, but I think we can implement our solution at a deeper level, where users will not see private API in the first place, rather then be warned after they try to use it.

at the top of most files in the code base,

I'm unsure of this. Any use of warnings.simplefilter is liable to trip every single file in Astropy.

we wouldn't need to worry about cluttering the code base with underscores

We'd have the same number of files, just with a single underscore added. I guess I don't agree that qualifies as clutter.

  • we wouldn't need to break things for users at any point, but they would know they are using API that can break at anytime (in a sense it is a bit like a permanent deprecation phase)
    -we wouldn't need to have a deprecation phase where we need to have both modules with underscores and compatibility ones without underscores, which will temporarily cause a lot of clutter in the code base

To address two points simultaneously. With deprecated compatibility modules we also wouldn't need to break things for users at any point during the deprecation period.
I agree that deprecation modules add clutter. It would be nice to avoid that, but at least the cleanup when the deprecation period elapse will be easy and only require deleting whole files.

this approach would arguably be more explicit and easier to learn for contributors rather than learning the rules about underscores and so on

For three reasons, I actually think the opposite.

  1. The first is that contributors rarely make new modules, so will not themselves need to deal with underscores.
  2. The second is that contributors overwhelmingly will have used terminal editors or IDEs with tab-autocomplete. On all systems I'm aware of tab-autocomplete filters underscore-prefixes (unless the user specifically starts with an underscore before pressing tab). In effect contributors will have had years of practical experience. (not that they'll need it, because of point 1).
  3. The third reason isn't a new point as you brought it up in the list of disadvantages. To reiterate then, in this APE we would be adopting a common standard, not making a new one. With a common standard everyone who has contributed to any other code base with will already know the rules. With a new standard, no one will have prior familiarity.

permanent warnings for users relying on private API.

This part I like! But from an above point, any use of warnings.simplefilter is liable to trip every single file in Astropy.

@mhvk
Copy link
Contributor

mhvk commented Aug 24, 2023

But what about simple exploration of the library? Is astropy.units.quantity public? To a user it very reasonably appears to be. Like us, numpy and scipy have this problem in spades (numpy.core is ostensibly private, but if it's so visibly public, is it really private?).

This to me is the strongest argument for change, but it would seem to require only one part of what is in this APE: have __all__ in __init__.py. I checked by comparing astropy/units/__init__.py (which does not have it) and astropy/coordinates/representations/__init__.py (which does have it -- thanks to you!): I find that (in ipython):

import astropy.units as u, astropy.coordinates as coord
u.qu<TAB>  # Indeed, I see `quantity`, which a user does not have to see
coord.representations.s<TAB>  # I do not see `spherical`

So, we can use __all__ in __init__.py to make clear to the user what is private; this part I'm fine with (even though for units it may be a bit of a pain, given how many entries it has -- arguably making the probability really small that someone will actually discover things by chance by tab-completion and then import something that should be private and which we actually change).

But this lessens the argument for the rename: We're now left with people exploring in editors, and those we can just as easily get with documentation (module-level docstrings would be good to have regardless, and now that the module is already "private", the docstring can actually be aimed at developers, as I've hoped would happen a long time ago: #8930).

So, I remain unconvinced about the APE as a whole, a lot of effort for a problem that, in practice, just has not occurred much (for numpy, it also has not happened all that much -- though more than for us; I think the driver there is mostly that they really wanted to reorganize their structure, which has lots of historical baggage, and thus in a way they are starting from a more or less blank slate -- note this is happening for numpy 2.0, where they are allowing API breaks!).

I am much happier with just formalizing the existing policy, by documenting it and adding __all__ to every subpackage's __init__.py, and adding notes to modules. This seems to get us 99% of the way, will cost substantially less review time (the real effort in any of this) and will have no cost to finger memory of existing maintainers.

Since making __all__ in each __init__.py seems less controversial, perhaps we can just get started with that?

@nstarman
Copy link
Member Author

nstarman commented Aug 24, 2023

I find that (in ipython):

Depends strongly on the editor. PEP-8 offers multiple means by which to define public versus private API, which is annoying and why this APE isn't just the sentence "let's do that!". In your findings, IPython command line appears to use __all__. In this screenshot, Jupyter notebook also uses underscores for modules (also valid a la PEP-8), and so finds spherical.
This original proposal was to do just __all__, which is still an improvement over our current situation. But in discussion we found, as this screenshot shows, that __all__ is only a partial measure. There should be one "truth" (__all__ in both this APE and your and @astrofrog's suggestions), but fully separating public from private API requires __all__, editing the docs, and module names. I'm happy that we __all__ agree 😄 on the first two. The module names is clearly more contentious.

CleanShot 2023-08-24 at 16 48 55@2x



note this is happening for numpy 2.0, where they are allowing API breaks!).

What about Astropy 7.0? Do we need an APE to switch to a more strict version of SemVer 😆.

now that the module is already "private", the docstring can actually be aimed at developers, as I've hoped would happen a long time ago: #8930).

Truly private modules would be excellent for hosting developer docs, I agree!

I am much happier with just formalizing the existing policy, by documenting it and adding all to every subpackage's init.py, and adding notes to modules. This seems to get us 99% of the way, will cost substantially less review time (the real effort in any of this) and will have no cost to finger memory of existing maintainers.
...
Since making all in each init.py seems less controversial, perhaps we can just get started with that?

I'm wholly on board with doing this (this is step 2 & 3 in this APE). It seems most folks that have looked at this APE are in agreement about the importance of adding __all__, which is good! Most of this PR, then, is uncontroversial.

I, and it seems some others, would like to do more, which is the controversial step 5 of the implementation.
If that step is removed / pushed to a subsequent PR, some of this APE will need to be rewritten as we will only be addressing a subset of the problems this APE tries to resolve. If the community / CoCo thinks it's best to separate steps 1-4 from 5, I would appreciate help on the rewrite.

@mhvk
Copy link
Contributor

mhvk commented Aug 24, 2023

Indeed, we agreed on __all__ for __init__.py, happy to just get going with that (maybe units last? The rest is far more obvious).

I'm still less sure about what we do for the modules. I'm afraid it's gotten too late here to actually look at the APE, so if this is nonsense, just ignore, but I still really dislike __all__ = [] too. Instead, as we have right now, __all__ can give immediate information for someone opening the file about what in it is actually exposed (even if at a different level for the user). It also makes constructing __all__ in __init__.py easy (or a good auto-sanity check if imports become explicit), and helps enable generation of documention for modules like units.equivalencies.

p.s. Will be on holidays for the next bit... At least, that will give others a chance to pipe in!

Copy link

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks extremely solid and a well motivated solution to a real problem, so I 100% get behind this APE. Here are a couple, mostly superficial, remarks and suggestions.


author: Nathaniel Starkman

date-created: 2013 November 5 <replace with the date you submit the APE>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this date can be set already ?

APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
Comment on lines +218 to +220
public. This means that a module can define an ``__all__`` attribute but if the
module itself is not public, then anything in the ``__all__`` attribute cannot
be publicly accessed from outside the module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this sentence. Does this imply that any private module should set __all__ = [], or does it mean that __all__, within a private module, defines an interface that's meant for internal use only ? (now that I wrote it down, the latter seems way more likely, but I think there's room to improve the phrasing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the latter. If you can't publicly navigate to a module, it doesn't matter what's "public" within that module. The full path must be public.
The issue with relying on this alone is what you discovered in that Issue (trying to find the one) where we have modules that look public (because they are not underscored) so it's possible to construct an import path that looks totally public, but actually isn't.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can't publicly navigate to a module, it doesn't matter what's "public" within that module.

I think that's what confused me; since it doesn't matter, why should we care what's in a private module's __all__ ? Wouldn't it be simpler to recommend/rule that __all__ must be set to [] in private modules ?

The issue with relying on this alone is what you discovered in that Issue (trying to find the one)

For reference, I believe you meant to link astropy/astropy#15666

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might still be useful to be able to indicate in a private sub-module which things are meant to be importable in other parts of astropy, so __all__ could be useful in that way rather than always setting to [].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This is demonstrated in lines 281-287

Copy link
Member Author

@nstarman nstarman Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the examples of good module layouts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I understand and agree with everything said in this thread, but I still can't wrap my head around the phrasing. I think my main issue is that I don't know what's the technical definition for "publicly accessed"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog, see https://github.com/astropy/astropy/pull/15109/files as an example doing what you're talking about. I agree, this is super useful.

APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Outdated Show resolved Hide resolved
APE22.rst Show resolved Hide resolved
APE22.rst Show resolved Hide resolved
Copy link
Member Author

@nstarman nstarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neutrinoceros!

APE22.rst Show resolved Hide resolved
Co-authored-by: Clément Robert <cr52@protonmail.com>
@nstarman
Copy link
Member Author

nstarman commented Dec 6, 2023

@eerovaher @WilliamJamieson @taldcroft @eteq @pllim, I would appreciate some more eyes on this APE, if you have the time in this busy season. As we often follow the lead of NumPy I think this APE is very timely given their refactor to quite nearly follow this APE.

@pllim
Copy link
Member

pllim commented Dec 7, 2023

@nstarman , not sure if I have time to ponder this soon. Can this wait till the Coordination Meeting or is that too long to wait?

@nstarman
Copy link
Member Author

nstarman commented Dec 8, 2023

@pllim, it can definitely wait to be approved. I'm not sure who would have the time to lead this effort if this APE were accepted sooner. But it would be great to have this be essentially finalized by the time of the Meeting.

Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@pllim
Copy link
Member

pllim commented Jan 29, 2024

@nstarman , apparently APE 22 is taken by #87 . You will have to rename your file... but at this point, I am not sure to what. Maybe @eteq can advise. See #87 (comment)

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this again and must admit I find the prospect of yet another major refactoring rather off-putting, especially as it aims to solve what seems a minor problem; certainly, at the user level we have had few if any complaints as we (re)move code, suggesting that users have no issues finding things they need and not rely (overly) on private API.

We should really think carefully about the costs of these refactoring exercises. I'd estimate the implementation, including reviews, is going to be of order 10^2 hours (certainly more than 10^1, surely less than 10^3, though perhaps 10^1.5 is more realistic). Do we really want to spend the equivalent of $ 5...10k on this? (Though I guess we have already spent several 10k $ equivalent on ruff "rules"; has that been really worth it? What else could we have done with that amount of expensive developer time?)

I also think we (and thus the APE) should consider seriously alternatives that start from current practice, that anything under the basic subpackages like astropy.coordinates should be taken to be private unless explicitly stated otherwise, and just try for formalize that in a way is good enough, without worrying too much about perfection (PEP 8 is explicit about why that is a bad idea). Can this not be made explicit without much effort (and no renaming)? E.g., how about the following alternative, with three nicely incremental steps:

  1. We explicitly document current practice that everything under subpackages is private and add a corresponding comment in all their top level __init__.py files (making appropriate exceptions in io and utils).
  2. We add __all__ to all submodule __init__.py files that include the public items, including public submodules of the subpackages.
  3. We slowly add __all__ to the rest of astropy, to indicate to ourselves which parts are meant to be used outside a given module.

``baz.__all__``, but what if ``baz`` is private?


Astropy partially implements all three of the conventions: some private classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes the problem sound much worse than it is: astropy actually has a clear, basic definition of what is public, which is everything directly in a submodule like astropy.coordinates. All imports should be from there, none from lower levels. In common usage the only exception for this is io.fits (all other io is accessed only through QTable.read/write), and some of the things in utils (which could certainly use clarification in their respective files).

Anyway, specifically for here, I think it would help the case if the text accurately described the current state, and if the example of things that were unclear actually referred to our code base rather than a hypothetical one, and gave some links to issues where our current layout was clearly the source of confusion/problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

astropy actually has a clear, basic definition of what is public, which is everything directly in a submodule like astropy.coordinates.

Do you know where this is written down?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it pretty obvious from the documentation - see the reference sections of time, coordinates, timeseries - though of course the moment I look at others, I see that nddata is not quite as good, and modeling has definite public modules (though clearly indicated as such).

I do stick with my main point that we actually have had very few cases where users were confused about where to import things from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not too difficult to find counter-examples from astroquery, specutils, photutils and ccdproc, and these are all coordinated packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting these should all be converted to a particular scheme too?Then we better get more feedback on this APE 😺 I doubt I'm the only one who feels time is better spent elsewhere (and sadly this is not something that can be done without impacting everybody).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presented a few counter-examples from coordinated packages to the claim

...that we actually have had very few cases where users were confused about where to import things from.

We have more control over coordinated packages, so if those are being confused about importing then I'm guessing the confusion is even worse in non-coordinated packages.


Let's consider an example::

src/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I feel any example should be from actual astropy code, not something fake.

Copy link
Member Author

@nstarman nstarman May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"truth is stranger than fiction. Fiction has to make sense." Mark Twain.
I'm happy to go find Astropy examples, but illustrative examples are useful pedagogy. When I update this with real examples I'll probably keep the illustrative one since it puts all scenarios in one grokkable example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: difficulties in public/private API definitions in astropy/astropy#16519


**We do nothing:**

This is the status quo. It is not a good option because it does not solve the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option has the quite large benefit that there won't be yet another large set of PRs to review that have nothing to do with science (plus the need to retrain with new file names).

Copy link
Member Author

@nstarman nstarman May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK. It's a fair bit of work for whomever does the file renaming (not because renaming files is hard but because of CI and the docs). In my experience reviewing a PR that's just a big pile of file renames (using git-mv so it doesn't even affect the git history) is SUPER easy.

Also, if you take a look at https://github.com/GalacticDynamics/galax/tree/main/src/galax/potential, I structured this submodule so there's the public API declared right at the top, then everything in https://github.com/GalacticDynamics/galax/tree/main/src/galax/potential/_potential looks exactly like it would were this Astropy. This boiled down to a folder rename, a __init__.py (and since I'm doing some things statically and with lazy loading, a __init__.pyi). This is easy to review.

APE22.rst Show resolved Hide resolved
issue. The aforementioned problems of not knowing what is stable and
supported, and what is not, remain.

**We allow** ``__all__`` **to be optional:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another alternative, where we just formalize current practice: that anything under the basic submodules like astropy.coordinates should be taken to be private unless explicitly stated otherwise. And we can add __all__ incrementally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very supportive of doing what you suggest in another APE! If such an APE existed I would make this APE a draft until the other APE were implemented. I still think this APE is the right way to go, but since the APE you are proposing is a subset of this one, so it's 🆗 by me to get the ball rolling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: IDK when they did this, but scipy now has underscore-prefixed module names everywhere. They fixed their public/private problem. e.g. https://github.com/scipy/scipy/tree/main/scipy/fft

@nstarman
Copy link
Member Author

nstarman commented May 31, 2024

Note: if we take cues from "upstream" packages, scipy now has underscore-prefixed names for modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants