New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import pyarrow-hotfix #669
Conversation
Frustratingly linux-libtiledb-dev failed with a segmentation fault during the Python tests both here and on the run on my fork. I restarted both |
linux-libtiledb-dev is consistently failing. Are any of the recent commits to libtiledb potentially affecting this? Also I manually triggered a nightly build off of the main branch on my fork to see if these failures are at all related to my PR |
Unfortunately the nightly on main passed 😭 |
Maybe it's the version of dask. The nightly on main installed dask 2023.9.3 from its cache. Whereas my PR is installing dask 2024.2.1 |
Restricting the upper bound for dask didn't fix it. Any ideas? |
Based on the test failures:
The failing test is trying to open an array (which doesn't exist) with TileDB-VCF/apis/python/tests/test_tiledbvcf.py Lines 975 to 976 in b3df71e
I don't know why this causes a segfault, but the use of This patch simplifies the test: diff --git a/apis/python/tests/test_tiledbvcf.py b/apis/python/tests/test_tiledbvcf.py
index 2e4fd3bd..87fd41b6 100755
--- a/apis/python/tests/test_tiledbvcf.py
+++ b/apis/python/tests/test_tiledbvcf.py
@@ -966,23 +966,12 @@ def test_disable_ingestion_tasks(tmp_path):
if platform.system() != "Linux":
return
- # query allele_count array with TileDB
+ # ensure the allele_count and variant_stats arrays are not created
ac_uri = os.path.join(tmp_path, "dataset", "allele_count")
+ assert not os.path.exists(ac_uri)
- contig = "1"
- region = slice(69896)
- with pytest.raises(Exception):
- with tiledb.open(ac_uri) as A:
- df = A.query(attrs=["alt", "count"], dims=["pos"]).df[contig, region]
-
- # query variant_stats array with TileDB
vs_uri = os.path.join(tmp_path, "dataset", "variant_stats")
-
- contig = "1"
- region = slice(12140)
- with pytest.raises(Exception):
- with tiledb.open(vs_uri) as A:
- df = A.query(attrs=["allele", "ac"], dims=["pos"]).df[contig, region]
+ assert not os.path.exists(vs_uri)
def test_ingestion_tasks(tmp_path): |
The next failure is also related to using |
Doesn't that suggest there is a compatibility problem between the release version of tiledb-py and the dev release of libtiledb? Also, I've started getting errors about missing the package https://github.com/TileDB-Inc/TileDB-VCF/actions/runs/8250806005/job/22566300642?pr=669#step:4:13 |
Yes, I believe so. I couldn't explain why the regular nightly is passing, but the regular nightly may be configured differently. |
29f537d
to
5d34f8c
Compare
Quick update. I fixed the dask import errors in #673. However, the linux-libtiledb-dev build is still segfault'ing during the tiledbvcf-py tests, just as before. Also, a new error while running the libtiledbvcf unit tests on macOS. -------------------------------------------------------------------------------
TileDB-VCF: Test Resume Ingest and Export With Contig Merge
-------------------------------------------------------------------------------
/Users/runner/work/TileDB-VCF/TileDB-VCF/libtiledbvcf/test/src/unit-vcf-export.cc:1744
...............................................................................
/Users/runner/work/TileDB-VCF/TileDB-VCF/libtiledbvcf/test/src/unit-vcf-export.cc:1744: FAILED:
{Unknown expression after the reported line}
due to unexpected exception with message:
Error loading metadata; 'version' field has invalid value.
===============================================================================
test cases: 78 | 77 passed | 1 failed
assertions: 6790 | 6789 passed | 1 failed This PR doesn't touch the libtiledbvcf source code, so I'm hoping this is spurious. I restarted it |
Confirmed. It was spurious |
5d34f8c
to
fce5a6e
Compare
Still failing the tiledbvcf-py tests: File "/home/runner/work/TileDB-VCF/TileDB-VCF/TileDB-VCF/apis/python/tests/test_tiledbvcf.py", line 40 in check_if_compatible
File "/home/runner/work/TileDB-VCF/TileDB-VCF/TileDB-VCF/apis/python/tests/test_tiledbvcf.py", line 992 in test_ingestion_tasks TileDB-VCF/apis/python/tests/test_tiledbvcf.py Lines 38 to 40 in 28c5e5d
TileDB-VCF/apis/python/tests/test_tiledbvcf.py Line 1003 in 28c5e5d
Currently tiledb-py 0.26.0 is getting installed because the conda env is cached. My next idea is to delete the cache to install a more recent tiledb-py |
Note that the nightly linux-libtiledb-dev failed on my fork because I deleted the GitHub Actions cache https://github.com/jdblischak/TileDB-VCF/actions/runs/8353245587/job/22864662362 In other words, I think the only reason it is passing on the main repo is because the conda env is cached |
b344a7f
to
df1f14e
Compare
Rebased onto #679 |
df1f14e
to
9e88352
Compare
I fixed the segfault (caused by running To minimize this PR as much as possible (since we can remove pyarrow-hotfix once we can install pyarrow 14+ in TileDB Cloud), I only import pyarrow-hotfix once during the initialization of pyarrow when tiledbvcf-py is imported. I assume this is sufficient to apply the hotfix. |
No Azure build was triggered for my last commit. Closing and reopening |
Reopening successfully triggered a new Azure build https://dev.azure.com/TileDB-Inc/CI/_build/results?buildId=38579&view=results |
Until we can install
pyarrow >=14.0.2
in the cloud conda environments, we should ensure that pyarrow-hotfix is installed and imported. From its PyPI Usage section:xref: TileDB-Inc/tiledb-vcf-feedstock#115
Also, I noticed that pyarrow isn't included in
setup_requires
TileDB-VCF/apis/python/setup.py
Lines 284 to 292 in b3df71e
Which seems unlikely to be true given that pyarrow is imported in
setup.py
:TileDB-VCF/apis/python/setup.py
Line 243 in b3df71e
In general, is there a reason that
setup_requires
,install_requires
, andtest_requires
aren't overly utilized in this repo? I expected to see at least pyarrow ininstall_requires
and dask intest_requires
.