Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies of from_parquet should be included or made explicit in docs #2952

Open
danielhundhausen opened this issue Jan 16, 2024 · 1 comment · May be fixed by #2967
Open

Dependencies of from_parquet should be included or made explicit in docs #2952

danielhundhausen opened this issue Jan 16, 2024 · 1 comment · May be fixed by #2967
Labels
docs Improvements or additions to documentation

Comments

@danielhundhausen
Copy link

Which documentation?

Other (please explain)?

What needs to be documented?

When setting up a fresh installation of awkward@2.5.2 (and earlier) and trying to run

import awkward as ak

_ = ak.from_parquet("foo.parquet")

the dependencies pyarrow, fsspec and pandas have to be installed by hand first for this minimal example to work. As far as I see this is not in the documentation. If this is the desired behaviour to keep the footprint of awkward small if the user does not want to use this function, I suggest to add a section to the docs explaining the necessary dependencies.
If this is not the desired behaviour it would be convenient to add the mentioned dependencies in the pyproject.toml.

Thanks for considering!

@danielhundhausen danielhundhausen added the docs Improvements or additions to documentation label Jan 16, 2024
@jpivarski
Copy link
Member

(Below are rough notes for fixing the problem.)

The fsspec library shows up in surprising ways, so it should probably become a strict dependency. However, fsspec uses the same trick of "being a lightweight dependency" by requesting other modules as needed—otherwise, it would depend on every remote-protocol library in the universe. If someone tries to call ak.from_json with an s3:// URI, they'll first be asked to install fsspec and then, separately, they'll be asked to install s3fs, which would be annoying.

So as a policy decision, let's make fsspec a strict dependency, so that users only get a request to install things once. (Unless they're using ak.to_parquet like @danielhundhausen and get asked to install pyarrow and s3fs or whatever. Sorry!)

The other runtime dependencies should remain runtime dependencies, since they only affect small sets of functions in a logical way (ak.to_arrow requires pyarrow, etc.). @agoose77 and I came up with a way to include this information in the ak._dispatch.high_level_function decorator so that it can be added to the documentation and tested for upfront in a way that only specifies the information (which functions depend on which libraries) in one place.

From a grep, below are all of the non-stdlib, non-dependency imports in src/awkward (some have import_* helper functions). We'll be able to take fsspec off the list when it becomes a strict dependency for Awkward. (Targeting version 2.6.0 on February 1, 2024, since a new strict dependency needs a new minor version.) Some of the dependencies aren't confined to one ak.* function because they're used to implement something like a backend or for passing data into cppyy or Numba, which can only happen if you've already imported cppyy or Numba. Some of these imports are through helper functions (import_*) that provide the "you need version x.y.z" error message.

Unique grep results, indicating which files they were found in:

src/awkward/_connect/cuda/__init__.py:    import cupy
src/awkward/_connect/jax/__init__.py:import jax.numpy
src/awkward/_connect/jax/reducers.py:import jax
src/awkward/_connect/jax/trees.py:import jax
src/awkward/_connect/numba/arrayview_cuda.py:from numba.core.errors import NumbaTypeError
src/awkward/_connect/numba/arrayview_cuda.py:import numba
src/awkward/_connect/numba/arrayview.py:from numba.core.errors import NumbaTypeError
src/awkward/_connect/numba/arrayview.py:import numba
src/awkward/_connect/numba/arrayview.py:import numba.core.typing
src/awkward/_connect/numba/arrayview.py:import numba.core.typing.ctypes_utils
src/awkward/_connect/numba/builder.py:from numba.core.errors import NumbaTypeError
src/awkward/_connect/numba/builder.py:    import llvmlite.ir.types
src/awkward/_connect/numba/builder.py:import numba
src/awkward/_connect/numba/builder.py:import numba.core.typing
src/awkward/_connect/numba/builder.py:import numba.core.typing.ctypes_utils
src/awkward/_connect/numba/growablebuffer.py:import numba
src/awkward/_connect/numba/growablebuffer.py:import numba.core.typing.npydecl
src/awkward/_connect/numba/layoutbuilder.py:from numba.core.errors import NumbaTypeError
src/awkward/_connect/numba/layoutbuilder.py:import numba
src/awkward/_connect/numba/layoutbuilder.py:import numba.core.typing.npydecl
src/awkward/_connect/numba/layout.py:from numba.core.errors import NumbaTypeError, NumbaValueError
src/awkward/_connect/numba/layout.py:import llvmlite.ir
src/awkward/_connect/numba/layout.py:    import llvmlite.ir.types
src/awkward/_connect/numba/layout.py:import numba
src/awkward/_connect/numexpr.py:        import numexpr
src/awkward/_connect/pyarrow.py:        import fsspec
src/awkward/_connect/pyarrow.py:    import pyarrow
src/awkward/_connect/pyarrow.py:    import pyarrow.compute as out
src/awkward/_connect/pyarrow.py:    import pyarrow.parquet as out
src/awkward/_connect/rdataframe/from_rdataframe.py:import cppyy
src/awkward/_connect/rdataframe/from_rdataframe.py:import ROOT
src/awkward/_connect/rdataframe/to_rdataframe.py:import ROOT
src/awkward/cppyy.py:        import cppyy
src/awkward/highlevel.py:            from IPython.utils.wildcard import dict_dir
src/awkward/highlevel.py:        import cppyy
src/awkward/highlevel.py:        import numba
src/awkward/jax.py:    import jax  # noqa: TID251
src/awkward/jax.py:        import jax  # noqa: TID251, F401
src/awkward/numba/__init__.py:        import numba
src/awkward/numba/__init__.py:    import numba
src/awkward/numba/layoutbuilder.py:        import numba
src/awkward/operations/ak_from_feather.py:    import pyarrow.feather
src/awkward/operations/ak_from_json.py:            import fsspec
src/awkward/operations/ak_from_parquet.py:    import fsspec.parquet
src/awkward/operations/ak_from_parquet.py:    import pyarrow.parquet as pyarrow_parquet
src/awkward/operations/ak_to_dataframe.py:        import pandas
src/awkward/operations/ak_to_feather.py:    import pyarrow.feather
src/awkward/operations/ak_to_json.py:                import fsspec
src/awkward/types/_awkward_datashape_parser.py:        from .lexer import Token
src/awkward/_backends/cupy.py:        cupy = cuda.import_cupy("Awkward Arrays with CUDA")
src/awkward/_connect/cuda/__init__.py:    cupy = import_cupy("Awkward Arrays with CUDA")
src/awkward/_connect/cuda/__init__.py:def import_cupy(name="Awkward Arrays with CUDA"):
src/awkward/_connect/cuda/_kernel_signatures.py:cupy = import_cupy("Awkward Arrays with CUDA")
src/awkward/_connect/cuda/_kernel_signatures.py:from awkward._connect.cuda import import_cupy
src/awkward/_connect/numexpr.py:    numexpr = _import_numexpr()
src/awkward/_connect/pyarrow.py:    import_pyarrow_parquet(name)
src/awkward/contents/content.py:        pyarrow = awkward._connect.pyarrow.import_pyarrow("to_arrow")
src/awkward/jax.py:def import_jax():
src/awkward/_kernels.py:        cupy = ak_cuda.import_cupy("Awkward Arrays with CUDA")
src/awkward/_nplikes/cupy.py:        self._module = ak._connect.cuda.import_cupy("Awkward Arrays with CUDA")
src/awkward/_nplikes/jax.py:        jax = ak.jax.import_jax()
src/awkward/operations/ak_from_parquet.py:    pyarrow_parquet = awkward._connect.pyarrow.import_pyarrow_parquet("ak.from_parquet")
src/awkward/operations/ak_to_parquet.py:    fsspec = awkward._connect.pyarrow.import_fsspec("ak.to_parquet")
src/awkward/operations/ak_to_parquet.py:    pyarrow_parquet = awkward._connect.pyarrow.import_pyarrow_parquet("ak.to_parquet")
src/awkward/operations/str/akstr_capitalize.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_capitalize.py:    pc = import_pyarrow_compute("ak.str.capitalize")
src/awkward/operations/str/akstr_center.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_center.py:    pc = import_pyarrow_compute("r")
src/awkward/operations/str/akstr_count_substring.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_count_substring.py:    pc = import_pyarrow_compute("ak.str.count_substring")
src/awkward/operations/str/akstr_count_substring_regex.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_count_substring_regex.py:    pc = import_pyarrow_compute("ak.str.count_substring_regex")
src/awkward/operations/str/akstr_ends_with.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_ends_with.py:    pc = import_pyarrow_compute("h")
src/awkward/operations/str/akstr_extract_regex.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_extract_regex.py:    pc = import_pyarrow_compute("x")
src/awkward/operations/str/akstr_find_substring.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_find_substring.py:    pc = import_pyarrow_compute("ak.str.find_substring")
src/awkward/operations/str/akstr_find_substring_regex.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_find_substring_regex.py:    pc = import_pyarrow_compute("ak.str.find_substring_regex")
src/awkward/operations/str/akstr_index_in.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_index_in.py:    pc = import_pyarrow_compute("ak.str.index_in")
src/awkward/operations/str/akstr_is_alnum.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_alnum.py:    pc = import_pyarrow_compute("m")
src/awkward/operations/str/akstr_is_alpha.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_alpha.py:    pc = import_pyarrow_compute("a")
src/awkward/operations/str/akstr_is_ascii.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_ascii.py:    pc = import_pyarrow_compute("i")
src/awkward/operations/str/akstr_is_decimal.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_decimal.py:    pc = import_pyarrow_compute("l")
src/awkward/operations/str/akstr_is_digit.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_digit.py:    pc = import_pyarrow_compute("t")
src/awkward/operations/str/akstr_is_in.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_in.py:    pc = import_pyarrow_compute("ak.str.is_in")
src/awkward/operations/str/akstr_is_lower.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_lower.py:    pc = import_pyarrow_compute("r")
src/awkward/operations/str/akstr_is_numeric.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_numeric.py:    pc = import_pyarrow_compute("c")
src/awkward/operations/str/akstr_is_printable.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_printable.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_is_space.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_space.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_is_title.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_title.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_is_upper.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_is_upper.py:    pc = import_pyarrow_compute("r")
src/awkward/operations/str/akstr_join_element_wise.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_join_element_wise.py:    pc = import_pyarrow_compute("ak.str.join_element_wise")
src/awkward/operations/str/akstr_join.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_join.py:    pc = import_pyarrow_compute("ak.str.join")
src/awkward/operations/str/akstr_length.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_length.py:    pc = import_pyarrow_compute("h")
src/awkward/operations/str/akstr_lower.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_lower.py:    pc = import_pyarrow_compute("r")
src/awkward/operations/str/akstr_lpad.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_lpad.py:    pc = import_pyarrow_compute("d")
src/awkward/operations/str/akstr_ltrim.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_ltrim.py:    pc = import_pyarrow_compute("m")
src/awkward/operations/str/akstr_ltrim_whitespace.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_ltrim_whitespace.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_match_like.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_match_like.py:    pc = import_pyarrow_compute("ak.str.match_like")
src/awkward/operations/str/akstr_match_substring.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_match_substring.py:    pc = import_pyarrow_compute("ak.str.match_substring")
src/awkward/operations/str/akstr_match_substring_regex.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_match_substring_regex.py:    pc = import_pyarrow_compute("x")
src/awkward/operations/str/akstr_repeat.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_repeat.py:    pc = import_pyarrow_compute("ak.str.repeat")
src/awkward/operations/str/akstr_replace_slice.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_replace_slice.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_replace_substring.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_replace_substring.py:    pc = import_pyarrow_compute("g")
src/awkward/operations/str/akstr_replace_substring_regex.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_replace_substring_regex.py:    pc = import_pyarrow_compute("x")
src/awkward/operations/str/akstr_reverse.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_reverse.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_rpad.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_rpad.py:    pc = import_pyarrow_compute("d")
src/awkward/operations/str/akstr_rtrim.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_rtrim.py:    pc = import_pyarrow_compute("m")
src/awkward/operations/str/akstr_rtrim_whitespace.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_rtrim_whitespace.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_slice.py:    pc = import_pyarrow_compute("ak.str.slice")
src/awkward/operations/str/akstr_split_pattern.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_split_pattern.py:    pc = import_pyarrow_compute("ak.str.split_pattern")
src/awkward/operations/str/akstr_split_pattern_regex.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_split_pattern_regex.py:    pc = import_pyarrow_compute("ak.str.split_pattern_regex")
src/awkward/operations/str/akstr_split_whitespace.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_split_whitespace.py:    pc = import_pyarrow_compute("ak.str.split_whitespace")
src/awkward/operations/str/akstr_starts_with.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_starts_with.py:    pc = import_pyarrow_compute("ak.str.starts_with")
src/awkward/operations/str/akstr_swapcase.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_swapcase.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_title.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_title.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_to_categorical.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_to_categorical.py:    pc = import_pyarrow_compute("ak.str.to_categorical")
src/awkward/operations/str/akstr_trim.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_trim.py:    pc = import_pyarrow_compute("m")
src/awkward/operations/str/akstr_trim_whitespace.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_trim_whitespace.py:    pc = import_pyarrow_compute("e")
src/awkward/operations/str/akstr_upper.py:    from awkward._connect.pyarrow import import_pyarrow_compute
src/awkward/operations/str/akstr_upper.py:    pc = import_pyarrow_compute("r")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants