ENH: Enable models for sparsely sampled fMRI series #376

effigies · 2019-02-04T22:06:21Z

Closes #252.

codecov · 2019-02-05T02:32:56Z

Codecov Report

Merging #376 into master will decrease coverage by 28.01%.
The diff coverage is 19.23%.

@@             Coverage Diff             @@
##           master     #376       +/-   ##
===========================================
- Coverage   73.79%   45.78%   -28.02%     
===========================================
  Files          23       23               
  Lines        2492     2514       +22     
  Branches      621      628        +7     
===========================================
- Hits         1839     1151      -688     
- Misses        471     1240      +769     
+ Partials      182      123       -59

Flag	Coverage Δ
#unittests	`45.78% <19.23%> (-28.02%)`	⬇️

Impacted Files	Coverage Δ
bids/analysis/analysis.py	`29.57% <0%> (-59.21%)`	⬇️
bids/variables/io.py	`42.71% <21.42%> (-32.42%)`	⬇️
bids/variables/entities.py	`73.33% <66.66%> (-14.31%)`	⬇️
bids/analysis/transformations/base.py	`17.87% <0%> (-68.72%)`	⬇️
bids/analysis/auto_model.py	`26.15% <0%> (-61.54%)`	⬇️
bids/variables/kollekshuns.py	`32.85% <0%> (-50.72%)`	⬇️
bids/analysis/transformations/munge.py	`44.44% <0%> (-46.79%)`	⬇️
bids/variables/variables.py	`42.22% <0%> (-46.23%)`	⬇️
bids/analysis/transformations/compute.py	`45.71% <0%> (-40.96%)`	⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13c857f...90b861d. Read the comment docs.

codecov · 2019-02-05T02:32:56Z

Codecov Report

Merging #376 into master will decrease coverage by 0.31%.
The diff coverage is 47.88%.

@@            Coverage Diff             @@
##           master     #376      +/-   ##
==========================================
- Coverage   62.29%   61.98%   -0.32%     
==========================================
  Files          27       27              
  Lines        4554     4611      +57     
  Branches     1173     1189      +16     
==========================================
+ Hits         2837     2858      +21     
- Misses       1433     1462      +29     
- Partials      284      291       +7

Flag	Coverage Δ
#unittests	`61.98% <47.88%> (-0.32%)`	⬇️

Impacted Files	Coverage Δ
bids/variables/kollekshuns.py	`83.57% <100%> (ø)`	⬆️
bids/variables/entities.py	`87.77% <100%> (+0.13%)`	⬆️
bids/variables/variables.py	`83.54% <36.36%> (-4.85%)`	⬇️
bids/variables/io.py	`72.24% <37.5%> (-3.01%)`	⬇️
bids/analysis/analysis.py	`86.91% <50%> (-1.87%)`	⬇️
bids/analysis/transformations/compute.py	`82.25% <57.89%> (-4.41%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24ef1c8...2d32955. Read the comment docs.

effigies · 2019-02-08T22:03:04Z

@satra @tyarkoni I think this is ready for review. There's almost certainly a better way to do the integration, but this should work.

In passing, I had to fix the thing where run_info sometimes was the string "events" instead of a list of RunInfo objects. I'll split that out on Monday, and address it to the right issues.

bids/analysis/transformations/compute.py

bids/variables/io.py

satra · 2019-02-09T17:16:17Z

bids/variables/variables.py

+        else:
+            from scipy.interpolate import interp1d
+            x = np.arange(n)
+            f = interp1d(x, self.values.values.ravel(), kind=kind)


this is ok for upsampling not fine for downsampling. depending on frequency content this will introduce aliasing.

I'm not sure avoiding aliasing is in scope... or at least, I'm not sure we want to implicitly start low-pass filtering the input signal without the user's explicit consent. I'd be okay adding an explicit TemporalFilter transformation (actually I think this is already in the spec and just not yet implemented), and the user can then call that themselves before the resampling step if they like. But doing some magic internally to pick a suitable filter or otherwise avoid aliasing means (a) reproducibility across packages is limited, and (b) we're taking on the responsibility for producing sensible results, and to my mind this really falls on the user.

bids/variables/variables.py

bids/analysis/transformations/compute.py

tyarkoni · 2019-02-11T20:34:27Z

bids/analysis/transformations/compute.py

+                # Use a unit that fits an whole number of times into both
+                # the interscan interval (TR) and the integration window (TA)
+                dt = gcd(TR, TA)
+                if dt > SR:


I think I'm probably missing something, but if we have to give up on a value that neatly bins TR and TA, why not just use SR at that point? E.g., if dt = 50 and SR = 10, then this will give us dt = 10, so we're at the SR anyway. If dt = 50 and SR = 17, we get dt = 25, which both prevents neat binning and is bigger than we wanted. Is the idea that at least every dt//SR^th bin will align nicely this way?

25 also permits neat binning...

Oh, duh. The inversion tripped me up. That is indeed the thing I was missing. :)

That said, I'm still not sure we shouldn't just use the SR. Since the user could have set the sr on the collection manually, the compromise solution here risks explicitly ignoring their intentions. This isn't an entirely benign decision, because in subsequent transformations, if a mismatch in sampling rates is detected, that will trigger a resampling in at least one variable, when potentially none might have been needed. They could also have code that depends on knowing what the SR is.

I think maybe we need to come up with a general policy about this, because it crops up in other places too (e.g., the issue about what to do if event files contain very short durations). I.e., the general question is, in a situation where the user appears to be picking sampling rates that are objectively inefficient or result in loss of information, should we be paternalistic and fix it for them, or just let them know they can probably do better?

One thing we could also do, though it's a bit annoying, is internally maintain a flag that indicates whether or not the user has explicitly overridden the default sampling rate. If SR=10 purely because that's the default, overriding it seems fine. If the user explicitly set it themselves, maybe we don't want to pick a different value for them.

It feels like we're trying to thread a needle where we make smart default choices based on our expertise on the one hand, and on the other, load a gun, cock it and point it at the foot of anyone who looks like they might know what they're doing...

Currently we reserve the footgun for people who use ToDense to get themselves a sampling interval that is relatively prime to gcd(TR, TA). If someone knows enough to manipulate the default sampling rate, they can learn that Convolve with a sparse acquisition paradigm will choose a slightly different interval than with a continuous acquisition paradigm unless the variable is already dense.

But perhaps there's a better general approach here? I admit that I chose this simply because building a boxcar function and taking a mean is easier than learning how to do this with interp1d.

The general issue potentially affects anyone who sets the sampling_rate explicitly in BIDSLayout.get_collections or load_variables. A sophisticated user could certainly know enough about their dataset to think "I'll use a sampling rate of 50, because that's high enough to account for my short-duration events, but still manageable computationally". Thereafter, I think that user could reasonably expect that if they call ToDense or Convolve without explicitly specifying a different sampling rate, the resulting variable will have an effective sampling rate of 50.

I think making decisions for the user internally is fine if there's an explicit argument in the transformation that implies as much. I.e., if ToDense has default sampling_rate='auto', then the docstring can just explain that we will act intelligently unless a numeric value is passed, and the result may not equal what's currently set in the collection. That seems fine. The issue is that in this case, there's no provision to specify the sampling rate in Convolve. I was against doing that because I don't think the spec should have that argument, but I guess we can make it a pybids-only argument, if only for the sake of making it clear to a sophisticated user what's going on.

tyarkoni · 2019-02-11T21:53:08Z

Hey, unrelated to the above thread, doesn't this implementation run counter to the suggestion previously that we handle as much of the densification logic as possible in the (implicit) to_dense call, housed in the Variable classes? I guess I'm not seeing why this stuff needs to go in Convolve. If we put it here, we will then need to put very similar-looking code in every other transformation that needs to implicitly convert from sparse to dense (accounting for the potential use of sparse acquisition). Is the thought that this is just temporary, and it'll eventually be refactored into the Variable hierarchy? If so, I'm fine with it. Just checking to see if that's what you're thinking.

effigies · 2019-02-12T16:40:10Z

I'm trying to think through this. Variables have no notion of a "default" sampling rate, as currently written, so moving this logic into to_dense() really only works if we have a global default and the condition that to_dense(None) (or similar) produces intelligent behavior. This doesn't seem to work nicely with the idea of a collection-level default sampling rate that may or may not be changed by a user...

Just to make clear, my distinction between typical and power users is pretty much whether they're doing this in JSON or Python. For the latter, most things are possible, so getting to the right behaviors for model writers is my priority.

tyarkoni · 2019-02-12T18:00:34Z

It doesn't necessarily have to live in to_dense, but it does need to abstracted out of Convolve, because there are other transformations that will need to deal with the same issue (e.g., if you try to orthogonalize a sparse variable with respect to a dense one, the sparse will be converted via to_dense. Actually, the way I'd intended this to work is that all conversion between variables is done before handing them to the Transformation-specific _transform logic.

If you look at the logic in the base Transformer class, which is admittedly kind of ugly at the moment, there's a place where _densify_variables is called. At this point, there's always an associated BIDSVariableCollection, so that's where the logic to determine the sampling rate should be going. The one (minor) hitch here is that there are currently no other Transformations that require dense variables. Currently, if you pass a dense=True argument to any transformation (undocumented, because not in spec), it will force any variables stored in the _densify class attribute to dense format before passing them to _transform. But in this case, Convolve must receive a dense variable, and there's no way to ensure that. The path of least resistance is probably to add a _force_dense attribute to the base Transformation class that mandates that input variables be densified before passing. Then that would route the inputs through _densify_variable, where the sampling rate would be determined via your existing logic, and then we just call to_dense() as currently implemented.

As far as as I can tell, this would elegantly solve the problem while adhering to the general principle that the transformation-specific logic should be kept as simple as possible, and the complexity should be hidden in the base class as much as possible (to make it easier to implement new transformations and minimize the maintenance burden). What do you think?

effigies · 2019-02-12T18:45:47Z

I think that seems like a decent approach. So a question: Should I clean up the tests here and get this merged, or start on that refactor? I think for @satra's purposes, it might be best to get a working sparse implementation in ASAP.

tyarkoni · 2019-02-12T19:10:21Z

Merging this as-is seems fine. The refactor will probably be pretty straightforward, and if any problems arise, they'll likely be coming from integration with 0.8 (though I think it should be fine), so we may as well hold off until that's ready.

effigies · 2019-02-12T19:28:15Z

Okay. I think I fixed up the failing test. @satra @mgxd Are you good with this? Have you had a chance to patch this into FitLins to test your stuff at all?

satra · 2019-02-12T19:30:03Z

@effigies, @mgxd - let's try that (testing this branch) with FITLNS before merging.

mathias: it could be a basic speech vs baseline model.

effigies · 2019-02-14T16:27:04Z

@mgxd Have you had a chance to give this a shot? Anything I can do to help?

effigies · 2019-02-19T18:21:25Z

I've merged the 0.8 upgrade in, so any work on this going forward should be done in the context of poldracklab/fitlins@7ee02a2. @mgxd @satra Ping me for anything I can do to help.

tyarkoni · 2019-03-05T16:38:51Z

@effigies given that this will need to be updated to reflect #411, I wonder if this is also a good time to move the sr computation logic into to_dense() in SparseRunVariable. I think that will probably also make your life easier refactoring Convolve.

effigies · 2019-03-05T16:50:02Z

Okay. Once that's in, I'll try to re-assess and see if I can see the path forward your way. (I can't remember if it ever clicked, so I guess it didn't click very well...)

tyarkoni · 2019-03-05T16:58:58Z

The general idea was that computing the sr to take TA into account is going to be necessary any time densification occurs, not just in Convolve. So I think the only thing Convolve should be doing is checking to see if sparse acquisition was used, and if so, triggering to_dense.

Maybe it makes sense to add a helper method in SparseRunVariable that does all of that internally—i.e., something like a more sensibly named densifyIfSparseAcquisition().

Assuming the densification is done implicitly, I think you then won't need to do much else in Convolve; @adelavega's code should just treat the densified variable like any other DenseRunVariable, and compute the oversampling rate as usual. But I might be overlooking something.

adelavega · 2019-03-05T17:36:41Z

Assuming the densification is done implicitly, I think you then won't need to do much else in Convolve; @adelavega's code should just treat the densified variable like any other DenseRunVariable, and compute the oversampling rate as usual.

I just realized that I hadn't tested with Dense variables yet, and it will probably fail because I'm getting the onsets from var, not df. I'll change that in a sec.

tyarkoni · 2019-03-05T18:12:48Z

Yet another good reason to prioritize #320 (not that I needed more)! Will try to get to that shortly.

adelavega · 2019-03-05T18:25:28Z

Yep, I am adding at least a few tests for convolve as part of #411

ENH: Calculate TA and store in run_info

90b861d

effigies force-pushed the enh/sparse_sampling branch from 1374b89 to 90b861d Compare February 5, 2019 02:32

effigies added 6 commits February 5, 2019 09:58

RF: Set default TA = TR, update RunInfo calls in tests

9057a7c

ENH: Upsample convolution, begin calculating downsample

4921a0a

FIX: Pass new variable args as kwargs in split()

ff319d1

ENH: Choose sampling rate to accommodate TR, TA and target SR

03cbeb0

RF: Thread integration_window through BIDSRunVariableCollection

42384fc

ENH: Perform integration over TA

979a4a2

effigies changed the title ~~[WIP] ENH: Enable models for sparsely sampled fMRI series~~ ENH: Enable models for sparsely sampled fMRI series Feb 8, 2019

satra reviewed Feb 9, 2019

View reviewed changes

bids/analysis/transformations/compute.py Show resolved Hide resolved

satra reviewed Feb 9, 2019

View reviewed changes

bids/variables/io.py Outdated Show resolved Hide resolved

satra reviewed Feb 9, 2019

View reviewed changes

bids/variables/variables.py Outdated Show resolved Hide resolved

tyarkoni mentioned this pull request Feb 9, 2019

Support sparse acquisition #252

Open

effigies mentioned this pull request Feb 11, 2019

FIX: Pass new variable args as kwargs in split() #386

Merged

effigies added 2 commits February 11, 2019 13:49

FIX: TA within 1ms of TR is just TR

998ce37

RF: Use sparse matrix for integration

10af708

tyarkoni reviewed Feb 11, 2019

View reviewed changes

effigies and others added 3 commits February 11, 2019 15:56

RF: Drop unneeded TA is None guard

b6d8271

DOC: Drop missing arguments from docstring

0f0531c

Merge branch 'master' into enh/sparse_sampling

09e8d9b

FIX: Make sure acquisition_time is defined

aa72c46

PY2: gcd

258f243

Merge branch 'master' into enh/sparse_sampling

2d32955

tyarkoni mentioned this pull request Mar 6, 2019

resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

Closed

effigies mentioned this pull request Mar 11, 2019

ENH: Enable models for sparsely sampled fMRI series #414

Open

effigies mentioned this pull request Apr 19, 2019

REL: 0.3.0 poldracklab/fitlins#132

Merged

yarikoptic mentioned this pull request Mar 16, 2023

FOI: sample con/solidation report bids-standard/maintenance-tools#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Enable models for sparsely sampled fMRI series #376

ENH: Enable models for sparsely sampled fMRI series #376

effigies commented Feb 4, 2019

codecov bot commented Feb 5, 2019

codecov bot commented Feb 5, 2019 •

edited

effigies commented Feb 8, 2019

satra Feb 9, 2019

tyarkoni Feb 9, 2019

tyarkoni Feb 11, 2019

effigies Feb 11, 2019

tyarkoni Feb 11, 2019

tyarkoni Feb 11, 2019

tyarkoni Feb 11, 2019

effigies Feb 11, 2019 •

edited

tyarkoni Feb 11, 2019

tyarkoni commented Feb 11, 2019 •

edited

effigies commented Feb 12, 2019

tyarkoni commented Feb 12, 2019

effigies commented Feb 12, 2019

tyarkoni commented Feb 12, 2019

effigies commented Feb 12, 2019

satra commented Feb 12, 2019

effigies commented Feb 14, 2019

effigies commented Feb 19, 2019

tyarkoni commented Mar 5, 2019

effigies commented Mar 5, 2019

tyarkoni commented Mar 5, 2019

adelavega commented Mar 5, 2019

tyarkoni commented Mar 5, 2019

adelavega commented Mar 5, 2019

ENH: Enable models for sparsely sampled fMRI series #376

Are you sure you want to change the base?

ENH: Enable models for sparsely sampled fMRI series #376

Conversation

effigies commented Feb 4, 2019

codecov bot commented Feb 5, 2019

Codecov Report

codecov bot commented Feb 5, 2019 • edited

Codecov Report

effigies commented Feb 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effigies Feb 11, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyarkoni commented Feb 11, 2019 • edited

effigies commented Feb 12, 2019

tyarkoni commented Feb 12, 2019

effigies commented Feb 12, 2019

tyarkoni commented Feb 12, 2019

effigies commented Feb 12, 2019

satra commented Feb 12, 2019

effigies commented Feb 14, 2019

effigies commented Feb 19, 2019

tyarkoni commented Mar 5, 2019

effigies commented Mar 5, 2019

tyarkoni commented Mar 5, 2019

adelavega commented Mar 5, 2019

tyarkoni commented Mar 5, 2019

adelavega commented Mar 5, 2019

codecov bot commented Feb 5, 2019 •

edited

effigies Feb 11, 2019 •

edited

tyarkoni commented Feb 11, 2019 •

edited