Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mypy type errors by inlining stale stub file and fixing ABC contract implementations. #1491

Merged
merged 4 commits into from Jan 16, 2021

Conversation

obi1kenobi
Copy link
Contributor

Description

This PR is only about improving type hint coverage and quality. It contains no new features and makes no substantive changes to existing functionality, so it should be fully covered by existing tests. This is similar in spirit to #1397 and #1398.

I noticed that a few type errors pointed out by mypy (see below) are because of the fact that the inference_data.py file has an associated type stub file inference_data.pyi and unfortunately the two files have drifted from each other significantly. As a result, mypy was using the type hints from the (outdated) type stub file and complaining about what it was seeing. Since arviz is Python 3+ only, I addressed this problem by merging the stub file into the source file itself, preserving all the type hints while deleting the stub file.

In addition, the InferenceData object only partially implemented the Mapping ABC contract, which mypy also did not appreciate. The main issues were that InferenceData implemented only a subset of the Mapping methods and had incorrect type signatures for the view-based ones (returning iterables instead of the required view objects).

All in all, this PR resolves the following mypy errors:

arviz/stats/stats.py:1164: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1166: error: Unsupported right operand type for in ("InferenceData")  [operator]
arviz/stats/stats.py:1167: error: Value of type "InferenceData" is not indexable  [index]
arviz/stats/stats.py:1168: error: Unsupported right operand type for in ("InferenceData")  [operator]
arviz/stats/stats.py:1169: error: Value of type "InferenceData" is not indexable  [index]
arviz/stats/stats.py:1171: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1172: error: Value of type "InferenceData" is not indexable  [index]
arviz/stats/stats.py:1172: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1174: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1176: error: Value of type "InferenceData" is not indexable  [index]
arviz/data/io_dict.py:9: error: Module 'arviz.data.inference_data' has no attribute 'WARMUP_TAG'  [attr-defined]

Unfortunately, since mypy now no longer ignores the inference_data.py file altogether (as the previously-preferred inference_data.pyi no longer exists), the following mypy errors emerge and exist in code I did not touch. I would love some guidance on this from someone more experienced with the xarray.Dataset type, since only some of the method accesses on the neighboring lines raise mypy complaints like this.

arviz/data/inference_data.py:1101: error: "Type[Dataset]" has no attribute "mean"  [attr-defined]
arviz/data/inference_data.py:1102: error: "Type[Dataset]" has no attribute "median"  [attr-defined]
arviz/data/inference_data.py:1103: error: "Type[Dataset]" has no attribute "min"  [attr-defined]
arviz/data/inference_data.py:1104: error: "Type[Dataset]" has no attribute "max"  [attr-defined]
arviz/data/inference_data.py:1105: error: "Type[Dataset]" has no attribute "cumsum"  [attr-defined]
arviz/data/inference_data.py:1106: error: "Type[Dataset]" has no attribute "sum"  [attr-defined]

If anyone has any ideas why mypy is able to determine that e.g. xr.Dataset.set_coords exists, but xr.Dataset.sum does not, I'd love to hear them and apply the corresponding fixes to this PR or a future one.

In case anyone is curious (maybe @ColCarroll?), after merging this PR, arviz would be 29 mypy errors away from being able to use typing_copilot's ratcheting mechanism for ensuring type coverage remains consistent and continues to improve over time. A number of those issues are solvable simply by applying # type: ignore -- wherever I am able to determine that it is safe to do so, I intend to open a PR applying the suppression. I could definitely use some help if anyone more knowledgeable about this library is interested in taking a look at some of the trickier cases!

Checklist

  • Follows official PR format
  • Code style correct (follows pylint and black guidelines)

@obi1kenobi
Copy link
Contributor Author

It appears that this PR is caught in a bit of a catch-22 situation at the moment. The problematic line is from typing_extensions import Literal, and the question is whether that import should be inside an if TYPE_CHECKING: conditional or not:

  • If the import is made conditional, then pylint complains that Literal is used but not defined, having failed to realize that it's only a type hint.
  • If the import is not made conditional, the benchmark suite fails because it does not appear to have typing_extensions installed.

I'm going to assume that a # pylint: disable-like rule suppression for the false-positive error is preferable here. Please let me know if there's an alternative solution you'd prefer instead, and I'd be happy to switch to that.

@obi1kenobi
Copy link
Contributor Author

Ah, the catch-22 is somewhat worse than I feared, and is not resolved by a pylint rule suppression:

  • Literal cannot only be imported during type checking, since the Literal["like", "regex"] type hint appears to be evaluated at module import even in non-type-checking runs.
  • However, if Literal is always imported (not just for type checking), then the benchmark tests fail since they don't install typing_extensions.

Would it be okay to add typing_extensions as a package dependency? I think that would resolve the problem.

@codecov
Copy link

codecov bot commented Jan 9, 2021

Codecov Report

Merging #1491 (94ca8af) into master (2d202de) will decrease coverage by 0.14%.
The diff coverage is 73.49%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1491      +/-   ##
==========================================
- Coverage   91.97%   91.83%   -0.15%     
==========================================
  Files         105      105              
  Lines       11239    11300      +61     
==========================================
+ Hits        10337    10377      +40     
- Misses        902      923      +21     
Impacted Files Coverage Δ
arviz/data/inference_data.py 83.92% <73.49%> (-1.90%) ⬇️
arviz/data/base.py 97.70% <0.00%> (ø)
arviz/data/io_json.py 66.66% <0.00%> (ø)
arviz/data/io_pystan.py 96.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d202de...94ca8af. Read the comment docs.

@obi1kenobi
Copy link
Contributor Author

At the moment it seems that typing_extensions is only installed in dev environments, because black depends on it. Otherwise the arviz package does not have a dependency on typing_extensions, which strikes me as odd because the stub file I inlined depended on it.

My instinct would be to add typing_extensions as a package dependency, but I didn't want to do that here unless the maintainers of this package explicitly approve of it. So I'm looking forward to your feedback :)

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know much about typing but it looks good.

My guess is that mean, sum... are not recognized because they are implemented in ImplementsDatasetReduce and inherited by the Dataset class which is probably too convoluted for mypy


@staticmethod
def from_netcdf(filename):
def from_netcdf(filename: str) -> "InferenceData":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sometimes strings are used for the types and sometimes the raw type? Trying to understand how typing works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Type hints with string literals like "InferenceData" here represent so-called forward references: https://www.python.org/dev/peps/pep-0484/#forward-references

Forward references allow us to refer to types and values that are not currently in scope. On this particular line, the from_netcdf function is being defined within the InferenceData class, which means that the InferenceData class is not fully constructed yet i.e. the InferenceData name does not yet refer to anything in the current scope. If we attempted to use InferenceData here, it would cause a NameError because the symbol is not defined yet. Instead, if we use a forward reference, we signal to mypy that it should look up what that name refers to when type-checking rather than when defining the class, which avoids the problem since by the time type-checking starts, the class has already been fully defined and its name can be resolved just fine.

This is why on line 1334 we can use InferenceData without the quotes: that line is located outside and after the InferenceData class definition, so that symbol is defined and present in the scope at the point where it is being used.

For "Literal[True]" and similar cases, we use quotes for a related but slightly different reason: the import of Literal on line 42 is only performed if the typing.TYPE_CHECKING special value is true. However, type hints that are not forward references are always executed -- even when not performing type-checking. This means that if we had used Literal[True] instead of using the string literal equivalent, executing the Literal[True] expression in a non-type-checking run would cause a NameError since Literal is not imported in that situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @OriolAbril! Do you have any thoughts on including typing_extensions as a package dependency in a future PR? That would ensure that users of this package are able to type-check with it correctly, instead of needing to make typing_extensions available in their environment by installing it separately from this package. If you think that's okay, I'm happy to open a PR for it.

My guess is that mean, sum... are not recognized because they are implemented in ImplementsDatasetReduce and inherited by the Dataset class which is probably too convoluted for mypy

Ah yes, I see the problem: this line causes those methods to be defined on Dataset, but the dynamic method definitions are not visible to mypy. This is not an actionable mypy error for this package (it's solvable on the xarray side but not here), and the best approach for this package is to suppress those issues with # type: ignore for now. I'll open a PR for that as soon as this PR is merged.


@staticmethod
def from_netcdf(filename):
def from_netcdf(filename: str) -> "InferenceData":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Type hints with string literals like "InferenceData" here represent so-called forward references: https://www.python.org/dev/peps/pep-0484/#forward-references

Forward references allow us to refer to types and values that are not currently in scope. On this particular line, the from_netcdf function is being defined within the InferenceData class, which means that the InferenceData class is not fully constructed yet i.e. the InferenceData name does not yet refer to anything in the current scope. If we attempted to use InferenceData here, it would cause a NameError because the symbol is not defined yet. Instead, if we use a forward reference, we signal to mypy that it should look up what that name refers to when type-checking rather than when defining the class, which avoids the problem since by the time type-checking starts, the class has already been fully defined and its name can be resolved just fine.

This is why on line 1334 we can use InferenceData without the quotes: that line is located outside and after the InferenceData class definition, so that symbol is defined and present in the scope at the point where it is being used.

For "Literal[True]" and similar cases, we use quotes for a related but slightly different reason: the import of Literal on line 42 is only performed if the typing.TYPE_CHECKING special value is true. However, type hints that are not forward references are always executed -- even when not performing type-checking. This means that if we had used Literal[True] instead of using the string literal equivalent, executing the Literal[True] expression in a non-type-checking run would cause a NameError since Literal is not imported in that situation.

@OriolAbril
Copy link
Member

Do you have any thoughts on including typing_extensions as a package dependency in a future PR?

I did not know about the package until today so everything I know about it comes from a quick search about it. Is this package needed for compatibility with python 3.6? Or also for experimental typing features not yet present in 3.7-8? I would have no problem adding the dependency, but I'm not sure it's worth the effort if next release drops python 3.6

Also, to double check. We can merge this as is without having to wait for typing_extensions dependency or xarray changes right?

@obi1kenobi
Copy link
Contributor Author

Do you have any thoughts on including typing_extensions as a package dependency in a future PR?

I did not know about the package until today so everything I know about it comes from a quick search about it. Is this package needed for compatibility with python 3.6? Or also for experimental typing features not yet present in 3.7-8? I would have no problem adding the dependency, but I'm not sure it's worth the effort if next release drops python 3.6

Also, to double check. We can merge this as is without having to wait for typing_extensions dependency or xarray changes right?

It's not just for 3.6 — in general, any typing-related improvements are added to typing_extensions as backports. For example, Literal was stabilized in Python 3.8, which means that using it in 3.7 requires typing_extensions. So in general, typing_extensions is a very common dependency of many packages that support multiple Python versions, even if they don't support 3.6.

This PR should be okay to merge as-is (since type-checking with mypy will pull in typing_extensions already as mypy depends on it), but let's play it safe and add typing_extensions as a package dependency if you're okay with it? I think it would be confusing to ship code that imports a package that is not a dependency of this package, even if we have high confidence it'll work out in practice. It's simply a risk we don't need to take :)

@OriolAbril OriolAbril merged commit 8cb60f7 into arviz-devs:master Jan 16, 2021
@obi1kenobi obi1kenobi deleted the inline_type_stub_file branch January 16, 2021 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants