Fix mypy type errors by inlining stale stub file and fixing ABC contract implementations. #1491

obi1kenobi · 2021-01-09T06:40:18Z

Description

This PR is only about improving type hint coverage and quality. It contains no new features and makes no substantive changes to existing functionality, so it should be fully covered by existing tests. This is similar in spirit to #1397 and #1398.

I noticed that a few type errors pointed out by mypy (see below) are because of the fact that the inference_data.py file has an associated type stub file inference_data.pyi and unfortunately the two files have drifted from each other significantly. As a result, mypy was using the type hints from the (outdated) type stub file and complaining about what it was seeing. Since arviz is Python 3+ only, I addressed this problem by merging the stub file into the source file itself, preserving all the type hints while deleting the stub file.

In addition, the InferenceData object only partially implemented the Mapping ABC contract, which mypy also did not appreciate. The main issues were that InferenceData implemented only a subset of the Mapping methods and had incorrect type signatures for the view-based ones (returning iterables instead of the required view objects).

All in all, this PR resolves the following mypy errors:

arviz/stats/stats.py:1164: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1166: error: Unsupported right operand type for in ("InferenceData")  [operator]
arviz/stats/stats.py:1167: error: Value of type "InferenceData" is not indexable  [index]
arviz/stats/stats.py:1168: error: Unsupported right operand type for in ("InferenceData")  [operator]
arviz/stats/stats.py:1169: error: Value of type "InferenceData" is not indexable  [index]
arviz/stats/stats.py:1171: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1172: error: Value of type "InferenceData" is not indexable  [index]
arviz/stats/stats.py:1172: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1174: error: "InferenceData" has no attribute "groups"  [attr-defined]
arviz/stats/stats.py:1176: error: Value of type "InferenceData" is not indexable  [index]
arviz/data/io_dict.py:9: error: Module 'arviz.data.inference_data' has no attribute 'WARMUP_TAG'  [attr-defined]

Unfortunately, since mypy now no longer ignores the inference_data.py file altogether (as the previously-preferred inference_data.pyi no longer exists), the following mypy errors emerge and exist in code I did not touch. I would love some guidance on this from someone more experienced with the xarray.Dataset type, since only some of the method accesses on the neighboring lines raise mypy complaints like this.

arviz/data/inference_data.py:1101: error: "Type[Dataset]" has no attribute "mean"  [attr-defined]
arviz/data/inference_data.py:1102: error: "Type[Dataset]" has no attribute "median"  [attr-defined]
arviz/data/inference_data.py:1103: error: "Type[Dataset]" has no attribute "min"  [attr-defined]
arviz/data/inference_data.py:1104: error: "Type[Dataset]" has no attribute "max"  [attr-defined]
arviz/data/inference_data.py:1105: error: "Type[Dataset]" has no attribute "cumsum"  [attr-defined]
arviz/data/inference_data.py:1106: error: "Type[Dataset]" has no attribute "sum"  [attr-defined]

If anyone has any ideas why mypy is able to determine that e.g. xr.Dataset.set_coords exists, but xr.Dataset.sum does not, I'd love to hear them and apply the corresponding fixes to this PR or a future one.

In case anyone is curious (maybe @ColCarroll?), after merging this PR, arviz would be 29 mypy errors away from being able to use typing_copilot's ratcheting mechanism for ensuring type coverage remains consistent and continues to improve over time. A number of those issues are solvable simply by applying # type: ignore -- wherever I am able to determine that it is safe to do so, I intend to open a PR applying the suppression. I could definitely use some help if anyone more knowledgeable about this library is interested in taking a look at some of the trickier cases!

Checklist

Follows official PR format
Code style correct (follows pylint and black guidelines)

obi1kenobi · 2021-01-09T06:58:01Z

It appears that this PR is caught in a bit of a catch-22 situation at the moment. The problematic line is from typing_extensions import Literal, and the question is whether that import should be inside an if TYPE_CHECKING: conditional or not:

If the import is made conditional, then pylint complains that Literal is used but not defined, having failed to realize that it's only a type hint.
If the import is not made conditional, the benchmark suite fails because it does not appear to have typing_extensions installed.

I'm going to assume that a # pylint: disable-like rule suppression for the false-positive error is preferable here. Please let me know if there's an alternative solution you'd prefer instead, and I'd be happy to switch to that.

obi1kenobi · 2021-01-09T07:14:37Z

Ah, the catch-22 is somewhat worse than I feared, and is not resolved by a pylint rule suppression:

Literal cannot only be imported during type checking, since the Literal["like", "regex"] type hint appears to be evaluated at module import even in non-type-checking runs.
However, if Literal is always imported (not just for type checking), then the benchmark tests fail since they don't install typing_extensions.

Would it be okay to add typing_extensions as a package dependency? I think that would resolve the problem.

codecov · 2021-01-09T07:29:51Z

Codecov Report

Merging #1491 (94ca8af) into master (2d202de) will decrease coverage by 0.14%.
The diff coverage is 73.49%.

@@            Coverage Diff             @@
##           master    #1491      +/-   ##
==========================================
- Coverage   91.97%   91.83%   -0.15%     
==========================================
  Files         105      105              
  Lines       11239    11300      +61     
==========================================
+ Hits        10337    10377      +40     
- Misses        902      923      +21

Impacted Files	Coverage Δ
arviz/data/inference_data.py	`83.92% <73.49%> (-1.90%)`	⬇️
arviz/data/base.py	`97.70% <0.00%> (ø)`
arviz/data/io_json.py	`66.66% <0.00%> (ø)`
arviz/data/io_pystan.py	`96.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d202de...94ca8af. Read the comment docs.

obi1kenobi · 2021-01-09T17:30:21Z

At the moment it seems that typing_extensions is only installed in dev environments, because black depends on it. Otherwise the arviz package does not have a dependency on typing_extensions, which strikes me as odd because the stub file I inlined depended on it.

My instinct would be to add typing_extensions as a package dependency, but I didn't want to do that here unless the maintainers of this package explicitly approve of it. So I'm looking forward to your feedback :)

OriolAbril

I don't really know much about typing but it looks good.

My guess is that mean, sum... are not recognized because they are implemented in ImplementsDatasetReduce and inherited by the Dataset class which is probably too convoluted for mypy

OriolAbril · 2021-01-09T22:14:08Z

arviz/data/inference_data.py


    @staticmethod
-    def from_netcdf(filename):
+    def from_netcdf(filename: str) -> "InferenceData":


Why sometimes strings are used for the types and sometimes the raw type? Trying to understand how typing works

That's a good question. Type hints with string literals like "InferenceData" here represent so-called forward references: https://www.python.org/dev/peps/pep-0484/#forward-references

Forward references allow us to refer to types and values that are not currently in scope. On this particular line, the from_netcdf function is being defined within the InferenceData class, which means that the InferenceData class is not fully constructed yet i.e. the InferenceData name does not yet refer to anything in the current scope. If we attempted to use InferenceData here, it would cause a NameError because the symbol is not defined yet. Instead, if we use a forward reference, we signal to mypy that it should look up what that name refers to when type-checking rather than when defining the class, which avoids the problem since by the time type-checking starts, the class has already been fully defined and its name can be resolved just fine.

This is why on line 1334 we can use InferenceData without the quotes: that line is located outside and after the InferenceData class definition, so that symbol is defined and present in the scope at the point where it is being used.

For "Literal[True]" and similar cases, we use quotes for a related but slightly different reason: the import of Literal on line 42 is only performed if the typing.TYPE_CHECKING special value is true. However, type hints that are not forward references are always executed -- even when not performing type-checking. This means that if we had used Literal[True] instead of using the string literal equivalent, executing the Literal[True] expression in a non-type-checking run would cause a NameError since Literal is not imported in that situation.

obi1kenobi

Thanks for the review @OriolAbril! Do you have any thoughts on including typing_extensions as a package dependency in a future PR? That would ensure that users of this package are able to type-check with it correctly, instead of needing to make typing_extensions available in their environment by installing it separately from this package. If you think that's okay, I'm happy to open a PR for it.

My guess is that mean, sum... are not recognized because they are implemented in ImplementsDatasetReduce and inherited by the Dataset class which is probably too convoluted for mypy

Ah yes, I see the problem: this line causes those methods to be defined on Dataset, but the dynamic method definitions are not visible to mypy. This is not an actionable mypy error for this package (it's solvable on the xarray side but not here), and the best approach for this package is to suppress those issues with # type: ignore for now. I'll open a PR for that as soon as this PR is merged.

obi1kenobi · 2021-01-10T00:33:57Z

arviz/data/inference_data.py


    @staticmethod
-    def from_netcdf(filename):
+    def from_netcdf(filename: str) -> "InferenceData":


That's a good question. Type hints with string literals like "InferenceData" here represent so-called forward references: https://www.python.org/dev/peps/pep-0484/#forward-references

Forward references allow us to refer to types and values that are not currently in scope. On this particular line, the from_netcdf function is being defined within the InferenceData class, which means that the InferenceData class is not fully constructed yet i.e. the InferenceData name does not yet refer to anything in the current scope. If we attempted to use InferenceData here, it would cause a NameError because the symbol is not defined yet. Instead, if we use a forward reference, we signal to mypy that it should look up what that name refers to when type-checking rather than when defining the class, which avoids the problem since by the time type-checking starts, the class has already been fully defined and its name can be resolved just fine.

This is why on line 1334 we can use InferenceData without the quotes: that line is located outside and after the InferenceData class definition, so that symbol is defined and present in the scope at the point where it is being used.

For "Literal[True]" and similar cases, we use quotes for a related but slightly different reason: the import of Literal on line 42 is only performed if the typing.TYPE_CHECKING special value is true. However, type hints that are not forward references are always executed -- even when not performing type-checking. This means that if we had used Literal[True] instead of using the string literal equivalent, executing the Literal[True] expression in a non-type-checking run would cause a NameError since Literal is not imported in that situation.

OriolAbril · 2021-01-10T22:53:51Z

Do you have any thoughts on including typing_extensions as a package dependency in a future PR?

I did not know about the package until today so everything I know about it comes from a quick search about it. Is this package needed for compatibility with python 3.6? Or also for experimental typing features not yet present in 3.7-8? I would have no problem adding the dependency, but I'm not sure it's worth the effort if next release drops python 3.6

Also, to double check. We can merge this as is without having to wait for typing_extensions dependency or xarray changes right?

obi1kenobi · 2021-01-10T23:18:52Z

Do you have any thoughts on including typing_extensions as a package dependency in a future PR?

I did not know about the package until today so everything I know about it comes from a quick search about it. Is this package needed for compatibility with python 3.6? Or also for experimental typing features not yet present in 3.7-8? I would have no problem adding the dependency, but I'm not sure it's worth the effort if next release drops python 3.6

Also, to double check. We can merge this as is without having to wait for typing_extensions dependency or xarray changes right?

It's not just for 3.6 — in general, any typing-related improvements are added to typing_extensions as backports. For example, Literal was stabilized in Python 3.8, which means that using it in 3.7 requires typing_extensions. So in general, typing_extensions is a very common dependency of many packages that support multiple Python versions, even if they don't support 3.6.

This PR should be okay to merge as-is (since type-checking with mypy will pull in typing_extensions already as mypy depends on it), but let's play it safe and add typing_extensions as a package dependency if you're okay with it? I think it would be confusing to ship code that imports a package that is not a dependency of this package, even if we have high confidence it'll work out in practice. It's simply a risk we don't need to take :)

obi1kenobi added 2 commits January 9, 2021 01:14

Inline stale stub file and fix ABC contract implementations.

d22dac5

Always import ItemsView and ValuesView since we inherit from them.

4e03986

obi1kenobi mentioned this pull request Jan 9, 2021

Resolve a few straightforward mypy type errors. #1492

Merged

2 tasks

obi1kenobi force-pushed the inline_type_stub_file branch from fdc79e6 to 4e03986 Compare January 9, 2021 07:15

Conditionally import Literal and use it as a forward reference.

d8e482b

OriolAbril approved these changes Jan 9, 2021

View reviewed changes

obi1kenobi commented Jan 10, 2021

View reviewed changes

Add typing_extensions dependency.

94ca8af

OriolAbril requested a review from canyon289 January 10, 2021 23:25

ColCarroll mentioned this pull request Jan 11, 2021

Enable type checking and adding typing_copilot to dev/CI environment #1496

Closed

OriolAbril merged commit 8cb60f7 into arviz-devs:master Jan 16, 2021

obi1kenobi deleted the inline_type_stub_file branch January 16, 2021 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mypy type errors by inlining stale stub file and fixing ABC contract implementations. #1491

Fix mypy type errors by inlining stale stub file and fixing ABC contract implementations. #1491

obi1kenobi commented Jan 9, 2021

obi1kenobi commented Jan 9, 2021

obi1kenobi commented Jan 9, 2021

codecov bot commented Jan 9, 2021 •

edited

obi1kenobi commented Jan 9, 2021

OriolAbril left a comment

OriolAbril Jan 9, 2021

obi1kenobi Jan 10, 2021

OriolAbril Jan 10, 2021

obi1kenobi left a comment •

edited

obi1kenobi Jan 10, 2021

OriolAbril commented Jan 10, 2021

obi1kenobi commented Jan 10, 2021

Fix mypy type errors by inlining stale stub file and fixing ABC contract implementations. #1491

Fix mypy type errors by inlining stale stub file and fixing ABC contract implementations. #1491

Conversation

obi1kenobi commented Jan 9, 2021

Description

Checklist

obi1kenobi commented Jan 9, 2021

obi1kenobi commented Jan 9, 2021

codecov bot commented Jan 9, 2021 • edited

Codecov Report

obi1kenobi commented Jan 9, 2021

OriolAbril left a comment

Choose a reason for hiding this comment

OriolAbril Jan 9, 2021

Choose a reason for hiding this comment

obi1kenobi Jan 10, 2021

Choose a reason for hiding this comment

OriolAbril Jan 10, 2021

Choose a reason for hiding this comment

obi1kenobi left a comment • edited

Choose a reason for hiding this comment

obi1kenobi Jan 10, 2021

Choose a reason for hiding this comment

OriolAbril commented Jan 10, 2021

obi1kenobi commented Jan 10, 2021

codecov bot commented Jan 9, 2021 •

edited

obi1kenobi left a comment •

edited