Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflection_table.extract_shoeboxes unexpectedly returns empty shoeboxes #2639

Open
LuisA92 opened this issue Apr 5, 2024 · 2 comments
Open
Assignees

Comments

@LuisA92
Copy link

LuisA92 commented Apr 5, 2024

Hello DIALS developers,

I'm encountering an issue where the reflection_table.extract_shoeboxes method is unexpectedly returning a large number of empty shoeboxes when processing rotation method data (1439 images). This issue does not occur if I process a smaller subset of the dataset.

I am using the following script to extract the shoeboxes: make_shoeboxes.py.

Steps to reproduce the issue

DIALS Version: DIALS 3.16.1-gf88b4b963-release
Dataset:
Rotation Data
0.5 degree oscillations
        1439 images
Images: https://data.sbgrid.org/dataset/816/

I use the following script to download the data and its corresponding pixel mask, and to process the data using DIALS:

#!/bin/bash

# save cwd
START_DIR=$(pwd)

# Download the data
rsync -av rsync://data.sbgrid.org/10.15785/SBGRID/816 .

# Verify data integrity
cd 816 ; shasum -c files.sha

#Download and move pixels.mask
wget https://data.sbgrid.org/upload/thumbnails/816/processing_bundle-816.tar.gz
tar -xzvf processing_bundle-816.tar.gz
mv processing/pixels.mask .

# Setup processing directory
mkdir pass1_
cd pass1_

# Space group number
sg=96

# Path to images
images="$START_DIR/816/301_helical_1_????.cbf"

# Import images
dials.import $images \
             geometry.detector.mosflm_beam_centre=290.5,225.2 \
             geometry.scan.image_range=1,1439\
             mask=../pixels.mask

# Find strong spots
dials.find_spots imported.expt spotfinder.filter.d_max=6

# Index
dials.index imported.expt strong.refl indexing.known_symmetry.space_group=$sg

# Global refinement
dials.refine indexed.expt indexed.refl refinement.parameterisation.scan_varying=True

# Predict
dials.predict refined.expt

I then use the following command to extract the shoeboxes:

cctbx.python make_shoeboxes.py predicted.refl imported.expt nx=6 ny=6 nz=2

Then, I use the following Python script to check the shoebox sizes and to count how many shoeboxes of each size there are:

from dials.array_family import flex
import numpy as np

# reflection table
table = flex.reflection_table.from_file("./shoeboxes.refl")

# shoebox sizes based off number of pixels in each shoebox
shoebox_sizes = list(
    map(lambda x: x.values().as_numpy_array().shape, table["shoebox"])
)

# count unique shoebox sizes
sizes, counts = np.unique(shoebox_sizes, return_counts=True)
print(f'shoebox sizes: {sizes}\n        counts: {counts}')

I get the following output:

bash-3.2$ python shoebox_sizes.py 
shoebox sizes: [  0 845]
       counts: [422358  44543]

To process half the dataset, I remove images 301_helical_1_0701.cbf through 301_helical_1_1439.cbf from the $START_DIR/816 directory, and I change the geometry.scan.image_range in the processing script with the following:

# Import images
dials.import $images \
             geometry.detector.mosflm_beam_centre=290.5,225.2 \
             geometry.scan.image_range=1,700\
             mask=../pixels.mask

I then use the following command to extract the shoeboxes:

cctbx.python make_shoeboxes.py predicted.refl imported.expt nx=6 ny=6 nz=2

and I get the following output from the Python script to check the shoebox sizes:

bash-3.2$ python shoebox_sizes.py 
shoebox sizes: [845]
       counts: [226302]
@graeme-winter graeme-winter self-assigned this Apr 11, 2024
@graeme-winter
Copy link
Contributor

OK, an update on this (I have been looking at it)

I suspect that the fundamental issue is not that we are not extracting all the shoeboxes, it is that the shoebox array is too large to save in a messagepack dataset file 🤦‍♂️

I add a check to the shape in memory before write:

Ethics-Gradient work :( [main] $ caffeinate python3 make_shoeboxes.py nx=6 ny=6 nz=2  predicted.refl imported.expt 
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
668020 / 676675 kept
Extracting 564476900 pixels
100%|███████████████████████████████████████| 1800/1800 [02:24<00:00, 12.47it/s]
Read 1800 images
Read / extract time: 141.59889888763428.1f / 2.6 seconds
Extracted shoebox shapes:
(5, 13, 13): 668020

Then check the file shape we re-load

Ethics-Gradient work :) [main] $ python3 check.py 
(0, 0, 0): 422358
(5, 13, 13): 245662

Ergo I believe this is a constraint of how we save the data

@LuisA92 I could save it to an HDF5 file if that would work for you?

@LuisA92
Copy link
Author

LuisA92 commented Apr 12, 2024

Thank you @graeme-winter, for taking the time to look at this.

Ah, your suspicion makes sense. Saving to an HDF5 file will work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants