Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable models for sparsely sampled fMRI series #376

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

effigies
Copy link
Collaborator

@effigies effigies commented Feb 4, 2019

Closes #252.

@codecov
Copy link

codecov bot commented Feb 5, 2019

Codecov Report

Merging #376 into master will decrease coverage by 28.01%.
The diff coverage is 19.23%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #376       +/-   ##
===========================================
- Coverage   73.79%   45.78%   -28.02%     
===========================================
  Files          23       23               
  Lines        2492     2514       +22     
  Branches      621      628        +7     
===========================================
- Hits         1839     1151      -688     
- Misses        471     1240      +769     
+ Partials      182      123       -59
Flag Coverage Δ
#unittests 45.78% <19.23%> (-28.02%) ⬇️
Impacted Files Coverage Δ
bids/analysis/analysis.py 29.57% <0%> (-59.21%) ⬇️
bids/variables/io.py 42.71% <21.42%> (-32.42%) ⬇️
bids/variables/entities.py 73.33% <66.66%> (-14.31%) ⬇️
bids/analysis/transformations/base.py 17.87% <0%> (-68.72%) ⬇️
bids/analysis/auto_model.py 26.15% <0%> (-61.54%) ⬇️
bids/variables/kollekshuns.py 32.85% <0%> (-50.72%) ⬇️
bids/analysis/transformations/munge.py 44.44% <0%> (-46.79%) ⬇️
bids/variables/variables.py 42.22% <0%> (-46.23%) ⬇️
bids/analysis/transformations/compute.py 45.71% <0%> (-40.96%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13c857f...90b861d. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 5, 2019

Codecov Report

Merging #376 into master will decrease coverage by 0.31%.
The diff coverage is 47.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #376      +/-   ##
==========================================
- Coverage   62.29%   61.98%   -0.32%     
==========================================
  Files          27       27              
  Lines        4554     4611      +57     
  Branches     1173     1189      +16     
==========================================
+ Hits         2837     2858      +21     
- Misses       1433     1462      +29     
- Partials      284      291       +7
Flag Coverage Δ
#unittests 61.98% <47.88%> (-0.32%) ⬇️
Impacted Files Coverage Δ
bids/variables/kollekshuns.py 83.57% <100%> (ø) ⬆️
bids/variables/entities.py 87.77% <100%> (+0.13%) ⬆️
bids/variables/variables.py 83.54% <36.36%> (-4.85%) ⬇️
bids/variables/io.py 72.24% <37.5%> (-3.01%) ⬇️
bids/analysis/analysis.py 86.91% <50%> (-1.87%) ⬇️
bids/analysis/transformations/compute.py 82.25% <57.89%> (-4.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24ef1c8...2d32955. Read the comment docs.

@effigies effigies changed the title [WIP] ENH: Enable models for sparsely sampled fMRI series ENH: Enable models for sparsely sampled fMRI series Feb 8, 2019
@effigies
Copy link
Collaborator Author

effigies commented Feb 8, 2019

@satra @tyarkoni I think this is ready for review. There's almost certainly a better way to do the integration, but this should work.

In passing, I had to fix the thing where run_info sometimes was the string "events" instead of a list of RunInfo objects. I'll split that out on Monday, and address it to the right issues.

bids/variables/io.py Outdated Show resolved Hide resolved
else:
from scipy.interpolate import interp1d
x = np.arange(n)
f = interp1d(x, self.values.values.ravel(), kind=kind)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok for upsampling not fine for downsampling. depending on frequency content this will introduce aliasing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure avoiding aliasing is in scope... or at least, I'm not sure we want to implicitly start low-pass filtering the input signal without the user's explicit consent. I'd be okay adding an explicit TemporalFilter transformation (actually I think this is already in the spec and just not yet implemented), and the user can then call that themselves before the resampling step if they like. But doing some magic internally to pick a suitable filter or otherwise avoid aliasing means (a) reproducibility across packages is limited, and (b) we're taking on the responsibility for producing sensible results, and to my mind this really falls on the user.

bids/variables/variables.py Outdated Show resolved Hide resolved
bids/analysis/transformations/compute.py Outdated Show resolved Hide resolved
# Use a unit that fits an whole number of times into both
# the interscan interval (TR) and the integration window (TA)
dt = gcd(TR, TA)
if dt > SR:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm probably missing something, but if we have to give up on a value that neatly bins TR and TA, why not just use SR at that point? E.g., if dt = 50 and SR = 10, then this will give us dt = 10, so we're at the SR anyway. If dt = 50 and SR = 17, we get dt = 25, which both prevents neat binning and is bigger than we wanted. Is the idea that at least every dt//SR^th bin will align nicely this way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 also permits neat binning...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, duh. The inversion tripped me up. That is indeed the thing I was missing. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I'm still not sure we shouldn't just use the SR. Since the user could have set the sr on the collection manually, the compromise solution here risks explicitly ignoring their intentions. This isn't an entirely benign decision, because in subsequent transformations, if a mismatch in sampling rates is detected, that will trigger a resampling in at least one variable, when potentially none might have been needed. They could also have code that depends on knowing what the SR is.

I think maybe we need to come up with a general policy about this, because it crops up in other places too (e.g., the issue about what to do if event files contain very short durations). I.e., the general question is, in a situation where the user appears to be picking sampling rates that are objectively inefficient or result in loss of information, should we be paternalistic and fix it for them, or just let them know they can probably do better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we could also do, though it's a bit annoying, is internally maintain a flag that indicates whether or not the user has explicitly overridden the default sampling rate. If SR=10 purely because that's the default, overriding it seems fine. If the user explicitly set it themselves, maybe we don't want to pick a different value for them.

Copy link
Collaborator Author

@effigies effigies Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we're trying to thread a needle where we make smart default choices based on our expertise on the one hand, and on the other, load a gun, cock it and point it at the foot of anyone who looks like they might know what they're doing...

Currently we reserve the footgun for people who use ToDense to get themselves a sampling interval that is relatively prime to gcd(TR, TA). If someone knows enough to manipulate the default sampling rate, they can learn that Convolve with a sparse acquisition paradigm will choose a slightly different interval than with a continuous acquisition paradigm unless the variable is already dense.

But perhaps there's a better general approach here? I admit that I chose this simply because building a boxcar function and taking a mean is easier than learning how to do this with interp1d.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general issue potentially affects anyone who sets the sampling_rate explicitly in BIDSLayout.get_collections or load_variables. A sophisticated user could certainly know enough about their dataset to think "I'll use a sampling rate of 50, because that's high enough to account for my short-duration events, but still manageable computationally". Thereafter, I think that user could reasonably expect that if they call ToDense or Convolve without explicitly specifying a different sampling rate, the resulting variable will have an effective sampling rate of 50.

I think making decisions for the user internally is fine if there's an explicit argument in the transformation that implies as much. I.e., if ToDense has default sampling_rate='auto', then the docstring can just explain that we will act intelligently unless a numeric value is passed, and the result may not equal what's currently set in the collection. That seems fine. The issue is that in this case, there's no provision to specify the sampling rate in Convolve. I was against doing that because I don't think the spec should have that argument, but I guess we can make it a pybids-only argument, if only for the sake of making it clear to a sophisticated user what's going on.

@tyarkoni
Copy link
Collaborator

tyarkoni commented Feb 11, 2019

Hey, unrelated to the above thread, doesn't this implementation run counter to the suggestion previously that we handle as much of the densification logic as possible in the (implicit) to_dense call, housed in the Variable classes? I guess I'm not seeing why this stuff needs to go in Convolve. If we put it here, we will then need to put very similar-looking code in every other transformation that needs to implicitly convert from sparse to dense (accounting for the potential use of sparse acquisition). Is the thought that this is just temporary, and it'll eventually be refactored into the Variable hierarchy? If so, I'm fine with it. Just checking to see if that's what you're thinking.

@effigies
Copy link
Collaborator Author

I'm trying to think through this. Variables have no notion of a "default" sampling rate, as currently written, so moving this logic into to_dense() really only works if we have a global default and the condition that to_dense(None) (or similar) produces intelligent behavior. This doesn't seem to work nicely with the idea of a collection-level default sampling rate that may or may not be changed by a user...

Just to make clear, my distinction between typical and power users is pretty much whether they're doing this in JSON or Python. For the latter, most things are possible, so getting to the right behaviors for model writers is my priority.

@tyarkoni
Copy link
Collaborator

It doesn't necessarily have to live in to_dense, but it does need to abstracted out of Convolve, because there are other transformations that will need to deal with the same issue (e.g., if you try to orthogonalize a sparse variable with respect to a dense one, the sparse will be converted via to_dense. Actually, the way I'd intended this to work is that all conversion between variables is done before handing them to the Transformation-specific _transform logic.

If you look at the logic in the base Transformer class, which is admittedly kind of ugly at the moment, there's a place where _densify_variables is called. At this point, there's always an associated BIDSVariableCollection, so that's where the logic to determine the sampling rate should be going. The one (minor) hitch here is that there are currently no other Transformations that require dense variables. Currently, if you pass a dense=True argument to any transformation (undocumented, because not in spec), it will force any variables stored in the _densify class attribute to dense format before passing them to _transform. But in this case, Convolve must receive a dense variable, and there's no way to ensure that. The path of least resistance is probably to add a _force_dense attribute to the base Transformation class that mandates that input variables be densified before passing. Then that would route the inputs through _densify_variable, where the sampling rate would be determined via your existing logic, and then we just call to_dense() as currently implemented.

As far as as I can tell, this would elegantly solve the problem while adhering to the general principle that the transformation-specific logic should be kept as simple as possible, and the complexity should be hidden in the base class as much as possible (to make it easier to implement new transformations and minimize the maintenance burden). What do you think?

@effigies
Copy link
Collaborator Author

I think that seems like a decent approach. So a question: Should I clean up the tests here and get this merged, or start on that refactor? I think for @satra's purposes, it might be best to get a working sparse implementation in ASAP.

@tyarkoni
Copy link
Collaborator

Merging this as-is seems fine. The refactor will probably be pretty straightforward, and if any problems arise, they'll likely be coming from integration with 0.8 (though I think it should be fine), so we may as well hold off until that's ready.

@effigies
Copy link
Collaborator Author

Okay. I think I fixed up the failing test. @satra @mgxd Are you good with this? Have you had a chance to patch this into FitLins to test your stuff at all?

@satra
Copy link
Contributor

satra commented Feb 12, 2019

@effigies, @mgxd - let's try that (testing this branch) with FITLNS before merging.

mathias: it could be a basic speech vs baseline model.

@effigies
Copy link
Collaborator Author

@mgxd Have you had a chance to give this a shot? Anything I can do to help?

@effigies
Copy link
Collaborator Author

I've merged the 0.8 upgrade in, so any work on this going forward should be done in the context of poldracklab/fitlins@7ee02a2. @mgxd @satra Ping me for anything I can do to help.

@tyarkoni
Copy link
Collaborator

tyarkoni commented Mar 5, 2019

@effigies given that this will need to be updated to reflect #411, I wonder if this is also a good time to move the sr computation logic into to_dense() in SparseRunVariable. I think that will probably also make your life easier refactoring Convolve.

@effigies
Copy link
Collaborator Author

effigies commented Mar 5, 2019

Okay. Once that's in, I'll try to re-assess and see if I can see the path forward your way. (I can't remember if it ever clicked, so I guess it didn't click very well...)

@tyarkoni
Copy link
Collaborator

tyarkoni commented Mar 5, 2019

The general idea was that computing the sr to take TA into account is going to be necessary any time densification occurs, not just in Convolve. So I think the only thing Convolve should be doing is checking to see if sparse acquisition was used, and if so, triggering to_dense.

Maybe it makes sense to add a helper method in SparseRunVariable that does all of that internally—i.e., something like a more sensibly named densifyIfSparseAcquisition().

Assuming the densification is done implicitly, I think you then won't need to do much else in Convolve; @adelavega's code should just treat the densified variable like any other DenseRunVariable, and compute the oversampling rate as usual. But I might be overlooking something.

@adelavega
Copy link
Collaborator

Assuming the densification is done implicitly, I think you then won't need to do much else in Convolve; @adelavega's code should just treat the densified variable like any other DenseRunVariable, and compute the oversampling rate as usual.

I just realized that I hadn't tested with Dense variables yet, and it will probably fail because I'm getting the onsets from var, not df. I'll change that in a sec.

@tyarkoni
Copy link
Collaborator

tyarkoni commented Mar 5, 2019

Yet another good reason to prioritize #320 (not that I needed more)! Will try to get to that shortly.

@adelavega
Copy link
Collaborator

Yep, I am adding at least a few tests for convolve as part of #411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants