Issues with inferring / managing sampling rate #82

TomDonoghue · 2024-04-16T14:36:17Z

We've encountered some issues with how Combinato manages sampling rates, which has led to it giving incorrect results (timestamps and block sizes).

The first issue is that when given h5 files, Combinato silently assumes a sampling rate of 32K (if not over-ridden):

combinato/combinato/extract/mp_extract.py

Lines 113 to 116 in 190251c

    
           if 'sr' in openfiles[jname].root.__members__: 
        
               sr = openfiles[jname].root.sr[0] 
        
           else: 
        
               sr = 32000.

The second issue is regardless of passed in info, the block sizes are extracted with a hard coded sampling rate of 32K

combinato/combinato/extract/extract.py

Line 92 in 190251c

starts = list(range(0, size, 32000*5*60))

32K is not the sampling rate of all files - to my knowledge it is standard for NeuraLynx, but not for BlackRock (which is 30K). We had an issue getting the wrong timestamps as it was assuming 32K for h5 files coming from BlackRock - clearly we should have been passing in an explicit value for the sampling rate, but I would not expect this to be guessed at by the code (and I did not find any documentation which described this behavior).

Also, as far as I can tell, even when passing in the sampling rate, the block size extraction would still be wrong, as it assumes 32K, and so would be a bit off when the real sampling rate is 30K (ending up as 5.3 minutes instead of 5).

A related question: do any of the output files store the sampling rate that was used? I couldn't find this clearly in any of the output files - it would be nice to be able to probe what sampling rate (and this timestamp definition) was used.

In terms of fixes:

In the h5 loading, I think Combinato should at least by explicit / verbose about inferring sampling rate, perhaps even failing without an explicit specification
In the block extraction, the sampling rate should be read, rather than hard coded (though I'm not clear at first glance how to read this value in that function)

As a sidenote - reading matfiles seems to be a bit more explicit (printing out the effective sampling rate) - though there is also a default value here of 24K, and I'm a little unclear why it's this value (different from the assumed value in h5 files):

combinato/combinato/extract/tools.py

Lines 22 to 29 in 190251c

    
           try: 
        
               sr = data['sr'].ravel()[0] 
        
               insert = 'stored' 
        
           except KeyError: 
        
               sr = DEFAULT_MAT_SR 
        
               insert = 'default' 
        
           print('Using ' + insert + ' sampling rate ({} kHz)'.format(sr/1000.))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with inferring / managing sampling rate #82

Issues with inferring / managing sampling rate #82

TomDonoghue commented Apr 16, 2024

Issues with inferring / managing sampling rate #82

Issues with inferring / managing sampling rate #82

Comments

TomDonoghue commented Apr 16, 2024