-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dials.filter_blanks #2232
base: main
Are you sure you want to change the base?
Add dials.filter_blanks #2232
Conversation
To remove blank images from scans in an automated way to allow use within a pipeline. Fixes #2231. Chose not to modify detect_blanks as this is a substantial API change
Thanks! I am running this on my 460 crystal MicroED dataset. Test1: on integrated.{expt,refl} |
# FIXME find a more graceful way of handling this...
if len(valid_ranges) > 1:
sys.exit("Currently multiple blank regions unsupported") At the moment many crystals are rejected by this line, although probably such crystals are not good anyway (i.e. very weak). |
Ah, no, I will need to find a proper fix here then. I was not sure how to wrangle the experiment identifiers. I will contemplate some more. |
Test1: This is already proving useful! As you can see, weird R factors in some shells (0.92 - 0.90 and 0.88 - 0.86) are gone. Note: the number of crystals are slightly different due to the above issue. After filtering: (39 crystals)
Before filtering: (43 crystals)
|
@biochem-fan please try again - I think I have a fix in for > 1 non-blank region but would welcome verification |
Anyone who looks at this please could you check the indexing around |
Test2 (filtering on
Not as good as Test 1 (filtering on I will re-run my tests on your new code. |
Those with multiple blank ranges are mostly rubbish.
Perhaps having a filter (in the pipeline, not necessarily within this command) by the number of spots and/or the number of frames within a valid range might be useful. |
For example, perhaps |
In this case, 3 frames in the middle were removed, which I think are false negatives.
|
An option to trim only at the start and the end might be good (just as in the delta cc half filter in |
I think I need to check the workings & make sure that the experiment identifier assignment is being handled correctly: quite possibly something going quite wrong here. |
Codecov Report
@@ Coverage Diff @@
## main #2232 +/- ##
==========================================
- Coverage 80.51% 80.46% -0.05%
==========================================
Files 587 589 +2
Lines 66907 67019 +112
Branches 8896 8918 +22
==========================================
+ Hits 53868 53927 +59
- Misses 10977 11019 +42
- Partials 2062 2073 +11 |
Just created myself a slightly artificial example case with this script
and everything did appear to work as I expect -> I will have a closer look to make sure I have all my sums right. There was an error in one of the commits where I mis-assigned an experiment ID |
Also:
I will keep checking |
@graeme-winter For example,
In this case, dials/src/dials/command_line/filter_blanks.py Line 121 in b65599c
valid array has the size of 45.
However the loop below sets flags at the original index, invalidating wrong frames and causing out-of-bound errors when the frames at the end are trimmed. See dials/src/dials/command_line/filter_blanks.py Lines 136 to 139 in b65599c
Replacing the array initialization to below seems to fix the issue but could you please look at this? valid = array("b", [0 for _ in range(scan.get_array_range()[1])])
for i in range(*scan.get_array_range()):
valid[i] = 1 |
@biochem-fan this is relevant, though dismal, reading about the array/image range idiosyncrasies in dxtbx and DIALS cctbx/dxtbx#186 |
In particular, the difference between these concepts is not so much the difference between 0- and 1-based indexing, but rather whether you are labelling images with ordinal numbers, or considering a "z position" in a scan. It turns out that the Clearly it seems I gave up with cctbx/dxtbx#186, but if I get a clear run at it I would like to pick it up again. |
Hmm, this is very confusing. What is the index for slicing, then? My above fix still gives out-of-errors in some cases. dials/src/dials/command_line/filter_blanks.py Line 178 in b65599c
|
I think scan slicing is done by ordinal image number. I'm not completely sure, but that's the impression I got from https://github.com/cctbx/dxtbx/blob/bb5f2e18067e0963f0beff6b449afaccf4cd0e2f/src/dxtbx/model/boost_python/scan.cc#L325. In that case, perhaps the problem here is that |
New program filter_blanks
To remove blank images from scans in an automated way to allow use within a pipeline. Fixes #2231. Chose not to modify detect_blanks as this is a substantial API change