Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] Use of AutomaticParameterWarning and "auto" parameters #245

Open
eddiebergman opened this issue Jan 30, 2024 · 7 comments
Open

[UX] Use of AutomaticParameterWarning and "auto" parameters #245

eddiebergman opened this issue Jan 30, 2024 · 7 comments
Assignees
Labels
decision A decision needs to be made ux Something to make user experience better

Comments

@eddiebergman
Copy link
Contributor

eddiebergman commented Jan 30, 2024

As things have developed, there's been a few instances where we can set some default in an automatic fashion, however it may not be clear to the user that we are making a hueristic choice, and it would be better if they specified this.

While it was not declared anywhere, I think it makes sense to make some rule set that essentially:

# amltk.exceptions
class AutomaticParameterWarning(UserWarning): ...

class AutomaticBlubWarning(AutomaticParameterWarning):
    """We can deduce that we often want to 'blub' but this can backfire silently when x, y, z."""
    
    
# amltk.blub
from amltk.exceptions import AutomaticBlubWarning

def function_that_blubs(is_blub: bool | Literal["auto"] = "auto") -> BlubBatch:
	if is_blub == "auto":
        warnings.warn(
        	"blub blub blub. To silence this warning, please specify"
        	" function_that_blubs(is_blub=...)",
            AutomaticBlubWarning,
            stacklevel=2,
        )

This should remain an issue until:

  • The UserWarning and automatic parameters in AMLTK have been unified
    • UserWarning derived from the use of "auto" should be replaced be a unique warning type.
  • Functions relying on the f(x: T | None = None) that make heuristic choices should be converted to use f(x: T | Literal["auto"] = "auto"). It's important to distinguish the absence of a value with the activation of an automatic heuristic. It might be good to give examples here:
# Here the `None` indicates the absence of a value and some default directory/name is fine.
def f(working_dir: Path | None = None):
	"""Uses timestamp if left as None"""
	...

# Here the `None` is indicating the absence of a choice.
# This applies a heuristic to make a choice and should be indicated as such
def run(display: bool | None = None):
	"""Display the progress. By default, this is true in a notebook, otherwise not."""
	...

# Prefer instead:
def run(display: bool | Literal["auto"] = "auto"): ...
@eddiebergman eddiebergman added decision A decision needs to be made ux Something to make user experience better labels Jan 30, 2024
@eddiebergman eddiebergman self-assigned this Jan 30, 2024
@eddiebergman
Copy link
Contributor Author

@LennartPurucker Please let me know what you think of this when you have time. It's come up a few times now

@LennartPurucker
Copy link
Collaborator

I am actually not sure about the first case either:

# Here the `None` indicates the absence of a value and some default directory/name is fine.
def f(working_dir: Path | None = None):
	"""Uses timestamp if left as None"""
	...

If this is a private function, None is good. But if this is a public function, then I would also set this to "auto" or the option "timestamp". Right now, from the function definition, while using the function in an editor, I could not tell what happens by default without reading the docstring.

Sidenote: IMO, the typical behavior of defaulting to timestamp from a None input is very unintuitive sometimes and actually not fully thread/multi-processing safe! Thus, I would rather know about this as a user. If the function prints the paths anyway, an additional warning might not be needed.

@LennartPurucker
Copy link
Collaborator

LennartPurucker commented Jan 31, 2024

Otherwise, this is a very good thing. Also, I am not yet fully sure about the code overhead of custom warnings. I guess custom warnings are better code quality and allow for catching/ignoring them, right?

@eddiebergman
Copy link
Contributor Author

I am actually not sure about the first case either:

# Here the `None` indicates the absence of a value and some default directory/name is fine.
def f(working_dir: Path | None = None):
	"""Uses timestamp if left as None"""
	...

If this is a private function, None is good. But if this is a public function, then I would also set this to "auto" or the option "timestamp". Right now, from the function definition, while using the function in an editor, I could not tell what happens by default without reading the docstring.

Sidenote: IMO, the typical behavior of defaulting to timestamp from a None input is very unintuitive sometimes and actually not fully thread/multi-processing safe! Thus, I would rather know about this as a user. If the function prints the paths anyway, an additional warning might not be needed.

I agree in principle but as a matter of ergonomics, I would still call this None. I can't articulate why (which is why there is this issue, trying to figure it out) but it's something along the lines of a default parameter can be provided which works well in most cases, which otherwise would require a user to explicitly provide one, or the function would not work.

In contrast to the other example, we are using a hueristic to make a choice amongst a discrete set, which could otherwise cause a lot of hidden behavior to occur. Another example being the threadpoolctl heuristic.

The question is where to draw a line. An optional parameter where the default parameter value is None is different than an auto parameter.

I guess I would say that an optional with None as the default will be set to some default which is independant of the system you are on and the other parameters, i.e. it will always be a timestamped path. Where as an auto will apply some hueristic depending on the other parameters or the system in which the program is being run.

Objections, comments, feedback?

@eddiebergman
Copy link
Contributor Author

Otherwise, this is a very good thing. Also, I am not yet fully sure about the code overhead of custom warnings. I guess custom warnings are better code quality and allow for catching/ignoring them, right?

I would consider it low overhead on the development side and yes, it allows for distuinguishing them more easily. Anything that wants to interact with that warning can do so by type rather than by content. This allows us to for example, change the wording in the warning without breaking any system that relies on the type for functionality.

This also applies to errors:

# Needed to make sure a certain exception occurs
# Will break if someone decides to make it a proper sentence and capitalize "Blub"
with pytest.raises(ValueError, match="blub should be provided"):
    ...

# Only breaks if the underlying exception type changes
with pytest.raises(BlueError):
    ...

You can see this in effect in one of the doc hooks, which disables warnings from the running examples in the doc strings:

@mkdocs.plugins.event_priority(-50)
def on_startup(**kwargs: Any):
# We get a load of deprecation warnings from SMAC
warnings.filterwarnings("ignore", category=DeprecationWarning)
# We ignore AutoWarnings as our example tend to rely on
# a lot of the `"auto"` parameters
warnings.filterwarnings("ignore", category=AutomaticParameterWarning)
# ConvergenceWarning from sklearn
warnings.filterwarnings("ignore", module="sklearn")

@LennartPurucker
Copy link
Collaborator

From our in-person discussion:

Cases:

  • Optional (e.g., sample weights)
  • Defaults (e.g., working dir)
  • Autos (e.g., visualization)

@eddiebergman
Copy link
Contributor Author

eddiebergman commented Feb 2, 2024

So warnings can be done at Runtime and I don't want to automate this in anyway with concrete types... too much overhead. However I would assume most people would have access to some sort of LSP that will give them the type definitions. We can inject this information there at the very least.

But I thought at least from type definitions we can be a bit more explicit, for example:

Screenshot_2024-02-02_13-42-40

It should have a near-zero runtime impact and is mainly for type checking, you can see a few variants here:

from __future__ import annotations

from pathlib import Path
from typing import TypeAlias, TypeVar

T = TypeVar("T")

# Basically says these types are either given a type `T` or are `None`
# The reason for `| None` is so that setting the default value to `None` works correctly
# in signatures
Auto: TypeAlias = type[T] | None
Default: TypeAlias = type[T] | None



# Usage 1
def f1(display: Auto[bool] = None) -> None:
    if display is None:
        # Make some automatic choice
        ...

# Or this? _Auto would be a singleton defined once
_Auto = None 

def f2(display: Auto[bool] = _Auto) -> None:
    if display is _Auto:
        # Make some automatic choice
        ...

# Same works for default
def h(path: Default[Path] = None) -> None:
    if path is None:
        # Create some default path
        ...

I'll be honest, I feel all of this will just make things more complicated and confusing. I would rather in the end just use the Path | None = None idiom.

Maybe could do the following to keep it as simple as possible

Auto = None
def f(p: Path | None = Auto):
	pass

The LSP still gives enough info and it doesn't deref to be None. (At least for pyright which is the core of vscode's pylance, so they should have similar type information displayed)

Screenshot_2024-02-02_13-49-42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision A decision needs to be made ux Something to make user experience better
Projects
None yet
Development

No branches or pull requests

2 participants