`reflection_table.extract_shoeboxes` unexpectedly returns empty shoeboxes #2639

LuisA92 · 2024-04-05T20:42:09Z

Hello DIALS developers,

I'm encountering an issue where the reflection_table.extract_shoeboxes method is unexpectedly returning a large number of empty shoeboxes when processing rotation method data (1439 images). This issue does not occur if I process a smaller subset of the dataset.

I am using the following script to extract the shoeboxes: make_shoeboxes.py.

Steps to reproduce the issue

DIALS Version: DIALS 3.16.1-gf88b4b963-release
Dataset:
Rotation Data
0.5 degree oscillations
1439 images
Images: https://data.sbgrid.org/dataset/816/

I use the following script to download the data and its corresponding pixel mask, and to process the data using DIALS:

#!/bin/bash

# save cwd
START_DIR=$(pwd)

# Download the data
rsync -av rsync://data.sbgrid.org/10.15785/SBGRID/816 .

# Verify data integrity
cd 816 ; shasum -c files.sha

#Download and move pixels.mask
wget https://data.sbgrid.org/upload/thumbnails/816/processing_bundle-816.tar.gz
tar -xzvf processing_bundle-816.tar.gz
mv processing/pixels.mask .

# Setup processing directory
mkdir pass1_
cd pass1_

# Space group number
sg=96

# Path to images
images="$START_DIR/816/301_helical_1_????.cbf"

# Import images
dials.import $images \
             geometry.detector.mosflm_beam_centre=290.5,225.2 \
             geometry.scan.image_range=1,1439\
             mask=../pixels.mask

# Find strong spots
dials.find_spots imported.expt spotfinder.filter.d_max=6

# Index
dials.index imported.expt strong.refl indexing.known_symmetry.space_group=$sg

# Global refinement
dials.refine indexed.expt indexed.refl refinement.parameterisation.scan_varying=True

# Predict
dials.predict refined.expt

I then use the following command to extract the shoeboxes:

cctbx.python make_shoeboxes.py predicted.refl imported.expt nx=6 ny=6 nz=2

Then, I use the following Python script to check the shoebox sizes and to count how many shoeboxes of each size there are:

from dials.array_family import flex
import numpy as np

# reflection table
table = flex.reflection_table.from_file("./shoeboxes.refl")

# shoebox sizes based off number of pixels in each shoebox
shoebox_sizes = list(
    map(lambda x: x.values().as_numpy_array().shape, table["shoebox"])
)

# count unique shoebox sizes
sizes, counts = np.unique(shoebox_sizes, return_counts=True)
print(f'shoebox sizes: {sizes}\n        counts: {counts}')

I get the following output:

bash-3.2$ python shoebox_sizes.py 
shoebox sizes: [  0 845]
       counts: [422358  44543]

To process half the dataset, I remove images 301_helical_1_0701.cbf through 301_helical_1_1439.cbf from the $START_DIR/816 directory, and I change the geometry.scan.image_range in the processing script with the following:

# Import images
dials.import $images \
             geometry.detector.mosflm_beam_centre=290.5,225.2 \
             geometry.scan.image_range=1,700\
             mask=../pixels.mask

I then use the following command to extract the shoeboxes:

cctbx.python make_shoeboxes.py predicted.refl imported.expt nx=6 ny=6 nz=2

and I get the following output from the Python script to check the shoebox sizes:

bash-3.2$ python shoebox_sizes.py 
shoebox sizes: [845]
       counts: [226302]

The text was updated successfully, but these errors were encountered:

graeme-winter · 2024-04-11T10:04:46Z

OK, an update on this (I have been looking at it)

I suspect that the fundamental issue is not that we are not extracting all the shoeboxes, it is that the shoebox array is too large to save in a messagepack dataset file 🤦‍♂️

I add a check to the shape in memory before write:

Ethics-Gradient work :( [main] $ caffeinate python3 make_shoeboxes.py nx=6 ny=6 nz=2  predicted.refl imported.expt 
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
668020 / 676675 kept
Extracting 564476900 pixels
100%|███████████████████████████████████████| 1800/1800 [02:24<00:00, 12.47it/s]
Read 1800 images
Read / extract time: 141.59889888763428.1f / 2.6 seconds
Extracted shoebox shapes:
(5, 13, 13): 668020

Then check the file shape we re-load

Ethics-Gradient work :) [main] $ python3 check.py 
(0, 0, 0): 422358
(5, 13, 13): 245662

Ergo I believe this is a constraint of how we save the data

@LuisA92 I could save it to an HDF5 file if that would work for you?

LuisA92 · 2024-04-12T11:43:12Z

Thank you @graeme-winter, for taking the time to look at this.

Ah, your suspicion makes sense. Saving to an HDF5 file will work!

graeme-winter self-assigned this Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`reflection_table.extract_shoeboxes` unexpectedly returns empty shoeboxes #2639

`reflection_table.extract_shoeboxes` unexpectedly returns empty shoeboxes #2639

LuisA92 commented Apr 5, 2024

graeme-winter commented Apr 11, 2024

LuisA92 commented Apr 12, 2024

reflection_table.extract_shoeboxes unexpectedly returns empty shoeboxes #2639

reflection_table.extract_shoeboxes unexpectedly returns empty shoeboxes #2639

Comments

LuisA92 commented Apr 5, 2024

Steps to reproduce the issue

graeme-winter commented Apr 11, 2024

LuisA92 commented Apr 12, 2024

`reflection_table.extract_shoeboxes` unexpectedly returns empty shoeboxes #2639

`reflection_table.extract_shoeboxes` unexpectedly returns empty shoeboxes #2639