Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with inferring / managing sampling rate #82

Open
TomDonoghue opened this issue Apr 16, 2024 · 0 comments
Open

Issues with inferring / managing sampling rate #82

TomDonoghue opened this issue Apr 16, 2024 · 0 comments

Comments

@TomDonoghue
Copy link

We've encountered some issues with how Combinato manages sampling rates, which has led to it giving incorrect results (timestamps and block sizes).

The first issue is that when given h5 files, Combinato silently assumes a sampling rate of 32K (if not over-ridden):

if 'sr' in openfiles[jname].root.__members__:
sr = openfiles[jname].root.sr[0]
else:
sr = 32000.

The second issue is regardless of passed in info, the block sizes are extracted with a hard coded sampling rate of 32K

starts = list(range(0, size, 32000*5*60))

32K is not the sampling rate of all files - to my knowledge it is standard for NeuraLynx, but not for BlackRock (which is 30K). We had an issue getting the wrong timestamps as it was assuming 32K for h5 files coming from BlackRock - clearly we should have been passing in an explicit value for the sampling rate, but I would not expect this to be guessed at by the code (and I did not find any documentation which described this behavior).

Also, as far as I can tell, even when passing in the sampling rate, the block size extraction would still be wrong, as it assumes 32K, and so would be a bit off when the real sampling rate is 30K (ending up as 5.3 minutes instead of 5).

A related question: do any of the output files store the sampling rate that was used? I couldn't find this clearly in any of the output files - it would be nice to be able to probe what sampling rate (and this timestamp definition) was used.

In terms of fixes:

  • In the h5 loading, I think Combinato should at least by explicit / verbose about inferring sampling rate, perhaps even failing without an explicit specification
  • In the block extraction, the sampling rate should be read, rather than hard coded (though I'm not clear at first glance how to read this value in that function)

As a sidenote - reading matfiles seems to be a bit more explicit (printing out the effective sampling rate) - though there is also a default value here of 24K, and I'm a little unclear why it's this value (different from the assumed value in h5 files):

try:
sr = data['sr'].ravel()[0]
insert = 'stored'
except KeyError:
sr = DEFAULT_MAT_SR
insert = 'default'
print('Using ' + insert + ' sampling rate ({} kHz)'.format(sr/1000.))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant