You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran the example script and it started downloading v4/validation.parquet ;
My wifi was slow and my computer went to sleep, I woke up my computer and the program was hung due to wifi disconnect, I killed the program and ran the program again to "resume the download"
I had to manually delete v4/validation.parquet since numerai sdk was not able to correctly resume the download.
Below is the output of the program that resumes the download.
2023-01-29 11:22:59,695 INFO numerapi.utils: resuming download
/home/raynos/.local/lib/python3.8/site-packages/urllib3/connectionpool.py:1043: InsecureRequestWarning: Unverified HTTPS request is being made to host 'numerai-datasets-us-west-2.s3.amazonaws.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
v4/validation.parquet: 40%|█████▋ | 463M/1.15G [00:00<00:00, 3.82GB/s]
v4/validation.parquet: 1.15GB [01:05, 17.4MB/s]
2023-01-29 11:24:07,248 INFO numerapi.utils: starting download
v4/live_409.parquet: 3.42MB [00:01, 1.90MB/s]
Below is the output of the program that tries to use the data file from the resumed download.
2023-01-29 11:24:20,449 INFO numerapi.utils: starting download
v4/features.json: 562kB [00:00, 727kB/s]
Reading minimal training data
Traceback (most recent call last):
File "./example_model.py", line 52, in <module>
validation_data = pd.read_parquet('v4/validation.parquet',
File "/home/raynos/.local/lib/python3.8/site-packages/pandas/io/parquet.py", line 493, in read_parquet
return impl.read(
File "/home/raynos/.local/lib/python3.8/site-packages/pandas/io/parquet.py", line 240, in read
result = self.api.parquet.read_table(
File "/home/raynos/.local/lib/python3.8/site-packages/pyarrow/parquet.py", line 1996, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
File "/home/raynos/.local/lib/python3.8/site-packages/pyarrow/parquet.py", line 1831, in read
table = self._dataset.to_table(
File "pyarrow/_dataset.pyx", line 323, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 2311, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
OSError: Couldn't deserialize thrift: TProtocolException: Invalid data
Deserializing page header failed.
I don't know if it's possible to do an integrity check with a checksum in the resuming download branch, but doing so would allow you to verify if the resumed download was successful or corrupted and then delete the corrupted file.
Leaving the corrupted file behind gives me a thrift protocol error since the parquet is not valid anymore.
The text was updated successfully, but these errors were encountered:
I ran the example script and it started downloading
v4/validation.parquet
;My wifi was slow and my computer went to sleep, I woke up my computer and the program was hung due to wifi disconnect, I killed the program and ran the program again to "resume the download"
Instead I got
I had to manually delete
v4/validation.parquet
since numerai sdk was not able to correctly resume the download.Below is the output of the program that resumes the download.
Below is the output of the program that tries to use the data file from the resumed download.
I don't know if it's possible to do an integrity check with a checksum in the
resuming download
branch, but doing so would allow you to verify if the resumed download was successful or corrupted and then delete the corrupted file.Leaving the corrupted file behind gives me a thrift protocol error since the
parquet
is not valid anymore.The text was updated successfully, but these errors were encountered: