Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name #346

wenh06 · 2022-03-18T08:15:02Z

Currently, the functions wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp only accept str for the input record_name, as they have str specific operations. wfdb.rdheader currently accepts pathlib.Path objects as input record_name since the os.path module is able to handle pathlib.Path as well as str type input.

For example, on a linux machine

import pathlib, wfdb
sp = "/home/wenhao/Jupyter/wenhao/data/CPSC2021/training_I/data_0_1"
pp = pathlib.Path(sp)
wfdb.rdrecord(pp)

would raise error

AttributeError                            Traceback (most recent call last)
<ipython-input-6-754a0c10c458> in <module>
----> 1 wfdb.rdrecord(pp)

~/.local/lib/python3.6/site-packages/wfdb/io/record.py in rdrecord(record_name, sampfrom, sampto, channels, physical, pn_dir, m2s, smooth_frames, ignore_skew, return_res, force_channels, channel_names, warn_empty)
   3426         pn_dir = posixpath.join(dir_list[0], get_version(dir_list[0]), *dir_list[1:])
   3427 
-> 3428     if record_name.endswith('.edf'):
   3429         record = edf2mit(record_name, pn_dir=pn_dir, record_only=True)
   3430     elif record_name.endswith('.wav'):

AttributeError: 'PosixPath' object has no attribute 'endswith'

Sometimes one would like to work locally with pathlib, which was introduced in Python 3.4 (PEP 428), to better handle the file paths.

…as record_name

bemoody · 2022-03-21T16:33:16Z

Thank you! This looks like a great improvement.

I'm not entirely familiar with pathlib so I'm a little unsure of some of the details.

One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name (which I consider a bug.)

Here, for example, it looks like you're passing a pathlib.Path object to download._stream_annotation. Is that a good idea or not? I guess that posixpath.join will automatically convert the argument back into a string, but will it correctly use slashes if running on a Windows platform? And does that mean that rdann would now accept backslashes in URLs, only on Windows?

If we do want the download functions to accept a pathlib.Path as an alternative to str then that should be documented.

I'm not necessarily opposed to either saying "pathlib.Path may be used only for local files" or saying "pathlib.Path may be used either for local files or for URLs", but I do think it's important to handle URL arguments in the same way on all hosts.

wenh06 · 2022-03-23T16:03:08Z

Thank you! This looks like a great improvement.

I'm not entirely familiar with pathlib so I'm a little unsure of some of the details.

One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name (which I consider a bug.)

Here, for example, it looks like you're passing a pathlib.Path object to download._stream_annotation. Is that a good idea or not? I guess that posixpath.join will automatically convert the argument back into a string, but will it correctly use slashes if running on a Windows platform? And does that mean that rdann would now accept backslashes in URLs, only on Windows?

If we do want the download functions to accept a pathlib.Path as an alternative to str then that should be documented.

I'm not necessarily opposed to either saying "pathlib.Path may be used only for local files" or saying "pathlib.Path may be used either for local files or for URLs", but I do think it's important to handle URL arguments in the same way on all hosts.

You're correct. On a Windows system, if you pass a Path object (actually a WindowsPath object) for the file_name, then error might occur when this file_name includes a subdirectory for

url = posixpath.join(config.db_index_url, pn_dir, file_name)

in the function download._stream_annotation. Such error can be avoided by forcing converting file_name to a PosixPath using pathlib.PurePosixPath as

url = posixpath.join(config.db_index_url, pn_dir, pathlib.PurePosixPath(file_name))

A drawback of pathlib is that it cannot handle the URLs correctly (mainly the double slashes), for example

>>> pathlib.PurePosixPath('https://physionet.org/files/')
PurePosixPath('https:/physionet.org/files')
>>> db_index_url = 'https://physionet.org/files/'
>>> sp = "./temp/13918_AV_psd.mat"
>>> pp = pathlib.Path(sp)
>>> db_index_url / pathlib.PurePosixPath("mitdb" / pp)
PurePosixPath('https:/physionet.org/files/mitdb/temp/13918_AV_psd.mat')

The following image is an example tested on a Windows machine

wenh06 · 2022-03-23T17:36:04Z

One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name (which I consider a bug.)

I also think this is a bug:

>>> wfdb.rdrecord("x_mitdb/x_111", pn_dir="mitdb", sampto=2000)
NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mitdb/1.0.0/x_111.hea
>>> wfdb.rdrecord("x_111", pn_dir="mitdb/1.0.0/x_mitdb/", sampto=2000)
<wfdb.io.record.Record at 0x7f6cd4ee7c50>

wenh06 · 2022-03-23T17:43:37Z

I think there's a lot of work to do to better deal with the paths. For example, I found currently wfdb detects whether a pn_dir contains the version number using

'/' not in db_dir

or

'.' not in pn_dir

I think it's better to specify a pattern for the database version and use re.search to detect it

DB_VERSION_PATTERN = re.compile("\d+.\d+\.\d+")

… number or not

cx1111 · 2022-04-25T16:12:34Z

Thanks very much for identifying the bugs. We're going to think a bit more about how we want the API to be in the next major version, such that the behavior is more explicit. Especially when adding cloud data sources.

Ivorforce

Would love to see this merged!

Ivorforce · 2022-05-27T13:50:09Z

wfdb/io/annotation.py

    # local file
    if pn_dir is None:
-        with open(record_name + '.' + extension, 'rb') as f:
+        with open(file_name, 'rb') as f:


paths offer a .open function, too, which might be more readable than using the global one.

Ivorforce · 2022-05-27T13:52:21Z

wfdb/io/record.py

@@ -4373,7 +4377,7 @@ def wrsamp(record_name, fs, units, sig_name, p_signal=None, d_signal=None,

    """
    # Check for valid record name
-    if '.' in record_name: 
+    if '.' in str(record_name): 


As directories might contain dots, maybe this wants to check for record_name.name instead.

wenh06 · 2022-05-29T15:18:36Z

Thanks very much for identifying the bugs. We're going to think a bit more about how we want the API to be in the next major version, such that the behavior is more explicit. Especially when adding cloud data sources.

I agree with you. I tried continuing to correct (enhance) path operations in other places for several hours but finally gave up. I hope that path operations would be designed in a more uniform and modern way in the next major version.

make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path …

69418c3

…as record_name

wenh06 added 3 commits March 24, 2022 01:46

add DB_VERSION_PATTERN to detection whether a pn_dir contains version…

1c8aff7

… number or not

allow functions in wfdb.io.download to handle pathlib objects

0c9725a

Merge remote-tracking branch 'upstream/master'

913b238

Ivorforce reviewed May 27, 2022

View reviewed changes

alistairewj mentioned this pull request Nov 14, 2023

Accessing a file in a subfolder on PhysioNet using pn_dir #467

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name #346

Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name #346

wenh06 commented Mar 18, 2022

bemoody commented Mar 21, 2022

wenh06 commented Mar 23, 2022 •

edited

wenh06 commented Mar 23, 2022

wenh06 commented Mar 23, 2022

cx1111 commented Apr 25, 2022

Ivorforce left a comment

Ivorforce May 27, 2022

Ivorforce May 27, 2022

wenh06 commented May 29, 2022

Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name #346

Are you sure you want to change the base?

Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name #346

Conversation

wenh06 commented Mar 18, 2022

bemoody commented Mar 21, 2022

wenh06 commented Mar 23, 2022 • edited

wenh06 commented Mar 23, 2022

wenh06 commented Mar 23, 2022

cx1111 commented Apr 25, 2022

Ivorforce left a comment

Choose a reason for hiding this comment

Ivorforce May 27, 2022

Choose a reason for hiding this comment

Ivorforce May 27, 2022

Choose a reason for hiding this comment

wenh06 commented May 29, 2022

wenh06 commented Mar 23, 2022 •

edited