Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rounding errors in to_dataframe #444

Open
bemoody opened this issue Mar 29, 2023 · 0 comments
Open

Rounding errors in to_dataframe #444

bemoody opened this issue Mar 29, 2023 · 0 comments

Comments

@bemoody
Copy link
Collaborator

bemoody commented Mar 29, 2023

The wfdb.Record.to_dataframe function generates a DataFrame from a Record object. The index of the resulting DataFrame is the elapsed or absolute time of each sample.

This code, however, will have significant rounding errors over a long record:

        if self.base_datetime is not None:
            index = pd.date_range(
                start=self.base_datetime,
                periods=self.sig_len,
                freq=pd.Timedelta(seconds=1 / self.fs),
            )
        else:
            index = pd.timedelta_range(
                start=pd.Timedelta(0),
                periods=self.sig_len,
                freq=pd.Timedelta(seconds=1 / self.fs),
            )

For example:

$ python3
>>> import wfdb
>>> r = wfdb.rdrecord('81739927', pn_dir='mimic4wdb/0.1.0/waves/p100/p10014354/81739927')
>>> str(r.base_datetime)
'2148-08-16 09:00:17.566000'
>>> r.fs
62.4725
>>> r.sig_len
6661120
>>> r.to_dataframe()
                             I     II    III      V  aVR     Pleth      Resp
2148-08-16 09:00:17.566000 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.582007 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.598014 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.614021 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.630028 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
...                         ..    ...    ...    ...  ...       ...       ...
2148-08-17 14:37:22.033805 NaN -0.220 -0.285 -0.025  NaN  0.404297  0.487477
2148-08-17 14:37:22.049812 NaN -0.030  0.005  0.025  NaN  0.396484  0.530238
2148-08-17 14:37:22.065819 NaN -0.065 -0.030 -0.015  NaN  0.386475  0.574832
2148-08-17 14:37:22.081826 NaN -0.265 -0.255 -0.125  NaN  0.375977  0.621258
2148-08-17 14:37:22.097833 NaN -0.550 -0.610 -0.355  NaN  0.366211  0.664020

[6661120 rows x 7 columns]
>>> str(r.get_absolute_time(6661119)
'2148-08-17 14:37:22.384920'

$ wfdbtime -r mimic4wdb/0.1.0/waves/p100/p10014354/81739927/ s6661119
       s6661119    29:37:04.819 [14:37:22.385 17/08/2148]

Here, get_absolute_time is correct to the nearest microsecond and the wfdbtime command is correct to the nearest millisecond. to_dataframe, however, is off by 0.287 seconds.

I think this would be avoided by using start and end arguments to date_range or timedelta_range, rather than using start and freq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant