deps: require pyarrow for pandas support #314

cguardia · 2020-10-09T04:48:01Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #265 🦕

refs googleapis#265

tswast · 2020-10-09T16:20:05Z

tests/unit/test_client.py

@@ -7839,7 +7835,6 @@ def test_load_table_from_dataframe_unknown_table(self):
        )

    @unittest.skipIf(pandas is None, "Requires `pandas`")
-    @unittest.skipIf(fastparquet is None, "Requires `fastparquet`")
    def test_load_table_from_dataframe_no_pyarrow_warning(self):


I'm a bit surprised to see this test passing. I guess we still have some code that falls back to the default pandas parquet rendering?

Can you look into if we can remove that code path?

Related: We should be able to simplify this docstring now:

python-bigquery/google/cloud/bigquery/client.py

Lines 2134 to 2147 in cbcb4b8

parquet_compression (Optional[str]):

[Beta] The compression method to use if intermittently

serializing ``dataframe`` to a parquet file.

If ``pyarrow`` and job config schema are used, the argument

is directly passed as the ``compression`` argument to the

underlying ``pyarrow.parquet.write_table()`` method (the

default value "snappy" gets converted to uppercase).

https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-table

If either ``pyarrow`` or job config schema are missing, the

argument is directly passed as the ``compression`` argument

to the underlying ``DataFrame.to_parquet()`` method.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet

tswast · 2020-10-09T16:23:28Z

setup.py

@@ -53,7 +53,6 @@
        "pyarrow >= 1.0.0, < 2.0dev",
    ],
    "tqdm": ["tqdm >= 4.7.4, <5.0.0dev"],
-    "fastparquet": ["fastparquet", "python-snappy", "llvmlite>=0.34.0"],


I'd like to see us add "pyarrow" to the "pandas" extras now, since it's needed for both uploads and downloads to dataframe.

We can maybe refactor the pyarrow >=1.0.0,<2.0dev string into a variable since it's going to appear 3 times in setup.py now too

cguardia · 2020-10-11T06:38:02Z

@tswast OK, this was a bit more involved than I expected at the beginning. Here goes my second attempt.

tswast

Thanks!

build: drop fastparquet from extras dependencies

487c19d

refs googleapis#265

cguardia requested a review from a team October 9, 2020 04:48

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 9, 2020

tswast requested changes Oct 9, 2020

View reviewed changes

cguardia added 2 commits October 11, 2020 00:43

move pyarrow to pandas extras, remove unused code paths

08f7805

Merge branch 'master' into 265-drop-fastparquet

9fd5a8d

tswast approved these changes Oct 12, 2020

View reviewed changes

Merge branch 'master' into 265-drop-fastparquet

5dd13cd

tswast changed the title ~~build: drop fastparquet from extras dependencies~~ deps: require pyarrow for pandas support Oct 12, 2020

tswast added the automerge Merge the pull request once unit tests and other checks pass. label Oct 12, 2020

gcf-merge-on-green bot merged commit 801e4c0 into googleapis:master Oct 12, 2020

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Oct 12, 2020

tswast mentioned this pull request Jul 21, 2021

fix!: use nullable Int64 and boolean dtypes in to_dataframe #786

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: require pyarrow for pandas support #314

deps: require pyarrow for pandas support #314

cguardia commented Oct 9, 2020

tswast Oct 9, 2020

tswast Oct 9, 2020

cguardia commented Oct 11, 2020

tswast left a comment

	parquet_compression (Optional[str]):
	[Beta] The compression method to use if intermittently
	serializing ``dataframe`` to a parquet file.

	If ``pyarrow`` and job config schema are used, the argument
	is directly passed as the ``compression`` argument to the
	underlying ``pyarrow.parquet.write_table()`` method (the
	default value "snappy" gets converted to uppercase).
	https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow-parquet-write-table

	If either ``pyarrow`` or job config schema are missing, the
	argument is directly passed as the ``compression`` argument
	to the underlying ``DataFrame.to_parquet()`` method.
	https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet

deps: require pyarrow for pandas support #314

deps: require pyarrow for pandas support #314

Conversation

cguardia commented Oct 9, 2020

tswast Oct 9, 2020

Choose a reason for hiding this comment

tswast Oct 9, 2020

Choose a reason for hiding this comment

cguardia commented Oct 11, 2020

tswast left a comment

Choose a reason for hiding this comment