Add a `small_cell` indexer to `dials.index` #2477

dagewa · 2023-07-28T14:22:30Z

WIP.

Currently, it runs, but indexing fails to assign indices for the cctbx.xfel.small_cell_process test cases. I'm not sure what the problem is.

Test procedure

First run the cctbx.xfel.small_cell_process test

cd $(dials.data get -q 4fluoro_cxi)/lcls_2022_smSFX_workshop_data/indexing/
cctbx.xfel.small_cell_process params_1.phil ../ten_cbfs/*.cbf

Now split the output to get reference geometry for each of the individual images

dials.split_experiments idx-0000_integrated.expt

Then import the first image like this, ensuring the detector geometry and wavelength come from the reference

dials.import ../ten_cbfs/cxily6520_r0164_02081.cbf reference_geometry=split_0.expt

Find spots using the same parameters as in params_1.phil

dials.find_spots imported.expt\
  lookup.mask=../calibration/psana_border_hot_shift4.mask\
  filter.min_spot_size=2\
  kernel_size=2,2\
  global_threshold=10

Now try to index

dials.index imported.expt strong.refl\
  method=small_cell\
  unit_cell=6.0226,7.2623,29.5203,90,90,90\
  space_group=C222

We get some output from small_cell, showing that a cell model has been found, but unfortunately indices can't be assigned :-(

$$$ stills_indexer::choose_best_orientation_matrix, candidate 0 initial outlier identification
$$$ stills_indexer::identify_outliers
Couldn't refine candidate 0, ValueError: Empty reflections table provided to ReflectionManager
No suitable indexing solution found

I'm wondering if @phyy-nx or @dwpaley might know what is wrong here?

@phyy-nx

FAO @phyy-nx @dwpaley. If `libtbx.phil` is imported here instead of `iotbx.phil` then cctbx.xfel.small_cell_process will break!

…l, and convert orientation to Crystal

cctbx.xfel.small_cell_process working with this branch, not the change in 4499187

is achieved by passing the params as None, as then the Strategy __init__ will parse the phil params. Weird setup, but that's the way it is.

phyy-nx · 2023-08-01T22:04:05Z

Ok, looking at this today. I found that at the top of the log for the original version it says this:

Spots on d-rings:  12
Total sf spots:    14

But for the dials.index version it says this:

Spots on d-rings:  10
Total sf spots:    14

This means that fewer spots are landing at the right d-spacings. This step happens way before any graph/pair matching and after spotfinding. To me implies there is either something wrong with how the geometry is being passed in or something wrong with the spotfinding. Looking at these options next.

phyy-nx · 2023-08-02T00:33:27Z

Ok, I've found the source of the 10 vs. 12 spots on rings, the wavelength. Use use_beam_reference=False for dials.import:

dials.import ../ten_cbfs/cxily6520_r0164_02081.cbf reference_geometry=split_0.expt use_beam_reference=False

The next problem seems to be dials.index mapping the input unit cell to a different setting. If I print out target_symmetry_primitive right in the __init__ function of the SmallCell strategy class, I get this:

crystal.symmetry(
    unit_cell=(4.717327423, 4.717327423, 29.5203, 90, 90, 100.6624023),
    space_group_symbol="C m m m (x-y,x+y,z)"
  )

While this does generate the same list of d-spacings, it does not generate the same list of operators. C222 generates a set of operators where h, k, and l are all in the same order, which this code relies on. However the operators generated by "C m m m (x-y,x+y,z)" include operators with transpositions of h, k, and l, which the above code doesn't consider. This diff re-writes that section to support more complex operators:

   @return list of original indices, list of asymmetric indices
   """
   sym = symmetry(unit_cell=ori.unit_cell(),space_group=phil.small_cell.spacegroup)
-  ops = []
-  for op in sym.space_group().expand_inv(sgtbx.tr_vec((0,0,0))).all_ops(): # this gets the spots related by inversion, aka Bijvoet mates
-    r = op.r().as_hkl()
-    subops = r.split(',')
-    tmpop = [1,1,1]
-    if '-' in subops[0]: tmpop[0] = -1
-    if '-' in subops[1]: tmpop[1] = -1
-    if '-' in subops[2]: tmpop[2] = -1
-
-    if tmpop not in ops:
-      ops.append(tmpop)
+  ops = [op.r() for op in sym.space_group().expand_inv(sgtbx.tr_vec((0,0,0))).all_ops()] # this gets the spots related by inversion, aka Bijvoet mates

   asu_indices = sym.build_miller_set(anomalous_flag=False, d_min = resolution)
   asu_indices_with_dups = []
   original_indicies = []
   for idx in asu_indices.indices():
     for op in ops:
-      orig_idx = (idx[0]*op[0],idx[1]*op[1],idx[2]*op[2])
+      orig_idx = op * idx
       if orig_idx not in original_indicies:
         original_indicies.append(orig_idx)
         asu_indices_with_dups.append(idx)

With that, I get the same graph/connections as I do on the cctbx.xfel.small_cell_process side. My next bug was due to resetting the imageset_id incorrectly, fixed with this diff in small_cell.py:

@@ -428,7 +418,8 @@ def small_cell_index_lattice_detail(experiments, reflections, horiz_phil):
   reflections['s0_proj'] = s0_projs

   from dials.algorithms.indexing.assign_indices import AssignIndicesGlobal
-  reflections['imageset_id'] = reflections['id']
+  if 'imageset_id' not in reflections:
+    reflections['imageset_id'] = reflections['id']
   reflections.centroid_px_to_mm(experiments)
   reflections.map_centroids_to_reciprocal_space(experiments)

With these changes we get a crystal model successfully, but dials.index fails in the refinement step. So the next thing to think about is that dials.index will invoke the indexer or the stills indexer, but neither are appropriate for small cell indexing. They both do complicated refinements. cctbx.xfel.small_cell_process skips them both.

My first thought is to bypass the refinement steps in the indexer or the stills indexer, but I haven't got that working yet.

dagewa · 2023-08-02T08:50:45Z

Thanks for the detailed analysis @phyy-nx! I would not have figured all that out so quickly. I guess you are putting the changes mentioned above into small_cell.py so I'll wait for you. Regarding refinement, I wonder if #2456 shows the way, or perhaps setting max_iterations=0?

phyy-nx · 2023-08-02T22:59:52Z

Got it! indexing.refinement_protocol.mode=None was part of the answer, but the full command that worked is

dials.index imported.expt strong.refl  method=small_cell  unit_cell=6.0226,7.2623,29.5203,90,90,90  space_group=C222 indexer=sequences indexing.refinement_protocol.mode=None refinement.parameterisation.crystal.fix=cell min_nref_per_parameter=1 reflections.outlier.algorithm=null

This requires cctbx/cctbx_project#914 which has the above code edits.

Hm, some of these extra parameters are the defaults for cctbx.xfel.small_cell_process/dials.stills_process:
https://github.com/dials/dials/blob/main/src/dials/command_line/stills_process.py#L304-L335
https://github.com/cctbx/cctbx_project/blob/master/xfel/small_cell/command_line/small_cell_process.py#L25
Maybe it would be convenient if when using small cell in dials.index that these defaults were automatically configured?

dagewa · 2023-08-03T09:17:41Z

Ah, nice one! For reference, this worked for me, with DIALS on the small_cell_wrapper branch and cctbx on the small_cell_fixes branch:

cp -r $(dials.data get -q 4fluoro_cxi)/lcls_2022_smSFX_workshop_data .
cd lcls_2022_smSFX_workshop_data/indexing/
cctbx.xfel.small_cell_process params_1.phil ../ten_cbfs/*.cbf
dials.split_experiments idx-0000_integrated.expt
dials.import ../ten_cbfs/cxily6520_r0164_02081.cbf\
  reference_geometry=split_0.expt use_beam_reference=False
dials.find_spots imported.expt\
  lookup.mask=../calibration/psana_border_hot_shift4.mask\
  filter.min_spot_size=2\
  kernel_size=2,2\
  global_threshold=10
dials.index imported.expt strong.refl method=small_cell\
  unit_cell=6.0226,7.2623,29.5203,90,90,90  space_group=C222\
  indexer=sequences indexing.refinement_protocol.mode=None\
  refinement.parameterisation.crystal.fix=cell\
  min_nref_per_parameter=1 reflections.outlier.algorithm=null

End result looks like a sensible solution:

Detector 1 RMSDs by panel:
+---------+--------+----------+----------+-----------------+
|   Panel |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|      id |        |     (px) |     (px) |           (deg) |
|---------+--------+----------+----------+-----------------|
|       1 |      1 |  0.31042 | 0.3738   |        0.034673 |
|       2 |      2 |  0.21429 | 0.22716  |        0.27764  |
|       5 |      1 |  0.24732 | 0.32019  |        0.16836  |
|       6 |      4 |  0.25749 | 0.83455  |        0.36003  |
|       7 |      1 |  0.11642 | 0.043923 |        0.25426  |
+---------+--------+----------+----------+-----------------+

Refined crystal models:
model 1 (9 reflections):
Crystal:
    Unit cell: 6.023, 7.262, 29.520, 90.000, 90.000, 90.000
    Space group: C 2 2 2
    U matrix:  {{ 0.1011,  0.9932, -0.0571},
                {-0.9217,  0.0719, -0.3811},
                {-0.3744,  0.0912,  0.9228}}
    B matrix:  {{ 0.1660,  0.0000,  0.0000},
                { 0.0000,  0.1377,  0.0000},
                { 0.0000,  0.0000,  0.0339}}
    A = UB:    {{ 0.0168,  0.1368, -0.0019},
                {-0.1530,  0.0099, -0.0129},
                {-0.0622,  0.0126,  0.0313}}
+------------+-------------+---------------+-------------+
|   Imageset |   # indexed |   # unindexed | % indexed   |
|------------+-------------+---------------+-------------|
|          0 |           9 |             5 | 64.3%       |
+------------+-------------+---------------+-------------+

dagewa · 2023-08-03T10:30:23Z

I'm thinking about the parameter defaults. I have an ulterior motive here, which is to try small_cell on ED stills, potentially with larger unit cells than the example here. I'd like to find some defaults or automatic configuration that works appropriately across the various use cases. In general, I don't like min_nref_per_parameter=1. I don't really trust the results of refinement when the observation to parameter ratio is small, especially when outlier rejection is switched off. I'd rather automatically fix more parameters. After fixing the cell (which I certainly agree is appropriate here) I think the detector tilt/twist (Tau2 and Tau3) should be fixed as they are poorly determined by the data.

phyy-nx · 2023-08-03T16:49:13Z

Oh that's a good point. In fact, dials.stills_process fixes the detector even for protein still images. I'd fix the wavelength, cell, and detector, and only refine the orientation (3 parameters total). For the above image, that would be 3 reflections per parameter.

That said, for X-ray small molecule, I still think min_nref_per_parameter=1 is needed. Not only can the small cell algorithm index from 3 spots, but in fact we have histogrammed the number of indexed spots per image that went into one of our XFEL datasets and the mode was 3 spots per image!

dagewa · 2023-08-04T13:45:45Z

I was wondering about how the small_cell lattice search method could override general refinement parameters for indexing. But actually I think it wouldn't be too nice to allow that. I think it's a job for the higher level caller (the dials.index user, or a pipeline like dials.stills_process).

One possibility though is that we instead just include a step of orientation-only refinement as part of the find_crystal_models call, that effectively sets those defaults. I guess that should be enough for the reflections that were indexed as part of the small_cell algorithm to again have their indices assigned when the AssignIndicesStrategy is used. Then the user can choose whether to do further refinement with whatever parameters they think are appropriate, or just set indexing.refinement_protocol.mode=None (or "repredict_only").

Another issue I'm thinking about is that the use case of indexing a single still pattern while refining detector parameters is indeed not very sensible, but it could be reasonable to refine detector parameters against a set of indexed cells. I suppose that is more a job for dials.refine though.

codecov · 2023-08-04T14:32:22Z

Codecov Report

Attention: 39 lines in your changes are missing coverage. Please review.

Comparison is base (6ec457a) 78.52% compared to head (67a7b1c) 78.43%.
Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2477      +/-   ##
==========================================
- Coverage   78.52%   78.43%   -0.09%     
==========================================
  Files         609      610       +1     
  Lines       75096    75145      +49     
  Branches    10723    10277     -446     
==========================================
- Hits        58971    58943      -28     
- Misses      13953    14024      +71     
- Partials     2172     2178       +6

phyy-nx · 2023-08-04T17:30:55Z

I was wondering about how the small_cell lattice search method could override general refinement parameters for indexing. But actually I think it wouldn't be too nice to allow that. I think it's a job for the higher level caller (the dials.index user, or a pipeline like dials.stills_process).

Agreed

One possibility though is that we instead just include a step of orientation-only refinement as part of the find_crystal_models call, that effectively sets those defaults. I guess that should be enough for the reflections that were indexed as part of the small_cell algorithm to again have their indices assigned when the AssignIndicesStrategy is used. Then the user can choose whether to do further refinement with whatever parameters they think are appropriate, or just set indexing.refinement_protocol.mode=None (or "repredict_only").

That sounds doable. One trick is that the reason I had to set refinement parameters (refinement.parameterisation.crystal.fix=cell min_nref_per_parameter=1 reflections.outlier.algorithm=null) even though refinement should have been off (indexing.refinement_protocol.mode=None), is because there is still a refinement done by the ModelEvaluation object. That would need to be controlled somehow too.

Another issue I'm thinking about is that the use case of indexing a single still pattern while refining detector parameters is indeed not very sensible, but it could be reasonable to refine detector parameters against a set of indexed cells. I suppose that is more a job for dials.refine though.

Agreed

dagewa · 2023-08-07T09:13:42Z

there is still a refinement done by the ModelEvaluation object. That would need to be controlled somehow too.

Ah yes, thanks! I was looking for that, but couldn't find it. It looks like this refinement should have a fallback mode when there are very few reflections, like in this case. At the moment it just gives up with the solution. The simplest approach may be to have another try here:

dials/src/dials/algorithms/indexing/model_evaluation.py

Lines 339 to 340 in 93c1364

    
           except (RuntimeError, ValueError): 
        
               return

before finally giving up only in the case that small_cell-appropriate refinement does not succeed. I'll take a look at this now.

dagewa · 2023-08-07T16:28:06Z

Just a note about adding a step of orientation-only refinement at the end of small_cell - I remember that LowResSpotMatch does something like this too. However, it doesn't use DIALS refinement, it calculates the best-fit $\mathbf{U}$ matrix using scitbx.math.superpose, by comparing the observed relps with the reference relps calculated using the known cell.

dials/src/dials/algorithms/indexing/lattice_search/low_res_spot_match.py

Lines 533 to 542 in 93c1364

    
           @staticmethod 
        
           def _fit_U_from_superposed_points(reference, other): 
        
               # Add the origin to both sets of points 
        
               reference.append((0, 0, 0)) 
        
               other.append((0, 0, 0)) 
        
               # Find U matrix that takes ideal relps to the reference 
        
               fit = superpose.least_squares_fit(reference, other) 
        
               return fit.r

If I can remember the details I'll try to add this too. That way you get a better $\mathbf{U}$ matrix going into index assignment, so more indexed spots available for the refinement done in ModelEvaluation

phyy-nx · 2023-08-07T16:34:14Z

Hi, I think small_cell currently does something similar to the above code?
https://github.com/cctbx/cctbx_project/blob/master/xfel/small_cell/solve_orientation.py#L40

dagewa · 2023-08-07T17:10:26Z

Yes, probably similar. I'm trying to work out why I'm the test example the small_cell algorithm ends with 6 indexed spots, but when the crystal model gets passed to AssignIndices only 3 actually get indexed

Co-authored-by: Aaron Brewster <asbrewster@lbl.gov>

dagewa · 2023-08-08T16:09:09Z

Ok, I think I've worked out the weirdness. If you add these lines to the end of the SmallCell.find_crystal_models method:

        from dials.algorithms.indexing.assign_indices import AssignIndicesGlobal
        experiments[0].crystal=crystal
        AssignIndicesGlobal(tolerance=0.3)(reflections, experiments)
        print(list(reflections["id"]))
        AssignIndicesGlobal(tolerance=0.3)(reflections, experiments)
        print(list(reflections["id"]))

then we are doing the index assignment like dials.index will do, but we do it twice. The results are a bit surprising:

[-1, -1, -1, -1, -1, -1, -1, 0, -1, -1, -1, -1, 0, 0]
[0, -1, 0, -1, 0, -1, 0, -1, -1, 0, -1, 0, -1, -1]

The first time round, only 3 reflections are indexed, which is exactly what I've been seeing in dials.index. The second time we get 6 reflections indexed!

The problem is that AssignIndicesGlobal makes a selection based on id:

dials/src/dials/algorithms/indexing/assign_indices.py

Line 30 in 93c1364

sel = inside_resolution_limit & (reflections["id"] == -1)

But small_cell_index_lattice_detail has already altered the reflections["id"], so we're not indexing a clean reflection list.

If instead of the above edit I reset the ids to -1:

        reflections["id"] *= 0
        reflections["id"] -= 1

then run dials.index, then I see 9 reflections indexed, not 3!

...
Working set:  11 12 0 2 9 4 6 7 
Indexed set:  0 2 4 6 9 11 
Done refining unit cell.  No new spots to add.
Candidate solutions:
+---------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------+
| unit_cell                       |   volume |   volume score |   #indexed |   % indexed |   % indexed score |   rmsd_xy |   rmsd_xy score |   overall score |
|---------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------|
| 4.72 4.72 29.52 90.0 90.0 100.7 |      646 |              0 |          9 |          64 |                 0 |      0.05 |               0 |               0 |
+---------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------+

dagewa · 2023-08-08T16:13:08Z

I haven't quite decided what to do about this yet. We could copy the reflection list before calling small_cell_index_lattice_detail, to avoid the side-effects. However I'm not totally convinced that AssignIndicesGlobal is doing the right thing anyway. It resets all Miller indices to 0,0,0 but then apparently only attempts to index reflections with id == -1. So it seems to "unindex" already indexed reflections.

phyy-nx · 2023-08-08T16:40:49Z

Hi, just chiming in, it looks like you are chasing the same leads we are. dials.index uses several rounds of AssignIndiciesGlobal, the first with a tolerance of 0.1, and the rest with 0.3. That indexes 9 reflections. But cctbx.xfel.small_cell_process only does the one round of AssignIndicesGlobal, with a tolerance of 0.1. So it only indexes 5 reflections. Looking at the extra 4 reflections in the image viewer and the predictions look weird, but that might be because of other bugs. But, importantly, we reduced the tolerance to 0.1 for small cell on purpose, from 0.3, in cctbx/cctbx_project@df85d76

From @dwpaley: we should be measuring the tolerance in reciprocal angstroms, but we use fractional coordinates. For small cells, this actually puts us closer to the volumes checked for protein data. For this image there could be multiple crystals in the shot, so it's impossible to know which tolerance is better. A tighter tolerance might be more realistic.

dagewa · 2023-08-08T16:51:35Z

Hi Aaron, away from the computer now, but just wanted to add that I'm not worried about the tolerance but the non-idempotent behaviour of the index assignment. Looks like there are bugs to me

dagewa · 2023-08-08T16:53:04Z

Indeed if you keep calling the index assignment it ends up toggling between two outputs!

phyy-nx · 2023-08-08T16:58:29Z

Ha, wow. Definitely a bug. Digging deeper with @dwpaley.

…erly

orientation by superposing ideal relps over the observed relp positions. This is seen to improve the model going into the refinement done by ModelEvaluation.

dagewa · 2023-08-09T10:14:02Z

@phyy-nx @dwpaley I've put in a workaround for #2485, but it may be something we can remove if that issue is fixed within AssignIndices.

Also, I wanted to bring 3761b4a to your attention. To test this I had a look at the output of refinement done by ModelEvaluation. Without this additional post-small_cell $\mathbf{U}$ matrix polishing I got this output:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.036485 | 0.053279 |         0.31702 |
|      1 |      9 | 0.026735 | 0.053942 |         0.30935 |
|      2 |      9 | 0.024164 | 0.048755 |         0.29908 |
|      3 |      9 | 0.020731 | 0.044542 |         0.29311 |
|      4 |      9 | 0.018689 | 0.044066 |         0.29199 |
+--------+--------+----------+----------+-----------------+

With the additional polishing step I saw this:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.03447  | 0.055502 |         0.30801 |
|      1 |      9 | 0.025447 | 0.054495 |         0.30319 |
|      2 |      9 | 0.02375  | 0.048833 |         0.29666 |
|      3 |      9 | 0.020577 | 0.04458  |         0.29276 |
|      4 |      9 | 0.018656 | 0.044083 |         0.29197 |
+--------+--------+----------+----------+-----------------+

The models end up at basically the same place, but including the superpose refinement gives a better start point, at least in $X$ and $\Delta \Psi$.

I have not tested this any more extensively than the single image example in #2477 (comment), but if it is something that proves generally useful then perhaps it should move from DIALS indexing's SmallCell to small_cell.py-proper.

phyy-nx · 2023-08-09T17:56:50Z

Hi, I think what you are showing is that there is a improvement when using scitbx.math.superpose.least_squares_fit instead of np.linalg.lstsq, since it's essentially the same inputs to both functions (hkl and xyz). In fact, I think you could replace np.linalg.lstsq in solve_orientation.py with superpose.least_squares_fit and reproduce your results. Could you try that?

dwpaley · 2023-08-09T18:57:07Z

@dagewa There's something weird going on here or thereabouts:

dials/src/dials/algorithms/indexing/lattice_search/__init__.py

Line 188 in 93c1364

sel = self.reflections["id"] == -1

Do you understand this code? Currently this is selecting the unindexed reflections and attempting to index them. If we don't index any of those, or if there were no unindexed reflections in the first place, then choose_best_orientation_matrix will return None,None. That can't possibly be right, right? It kills my small_cell indexing for 5 out of 10 shots in the ten_cbfs dataset. Any thoughts? Changing == to != on the given line (i.e. repeat indexing on the already-indexed reflections) makes the code run, but I'm wary because I don't know what this was supposed to do.

dwpaley · 2023-08-09T23:25:30Z

Ok, @phyy-nx and I figured this out. The small_cell lattice search was breaking convention by marking reflections as indexed. If we reset reflections['id'] to -1 (and reflections['miller_index'] to (0,0,0)) before leaving SmallCell.find_crystal_models, then this problem goes away.

dagewa · 2023-08-10T12:37:52Z

Hi @dwpaley I think you're describing what I also found and fixed in 5417d18, right?

dagewa · 2023-08-10T14:12:20Z

Hi, I think what you are showing is that there is a improvement when using scitbx.math.superpose.least_squares_fit instead of np.linalg.lstsq, since it's essentially the same inputs to both functions (hkl and xyz). In fact, I think you could replace np.linalg.lstsq in solve_orientation.py with superpose.least_squares_fit and reproduce your results. Could you try that?

As a test I did that (inefficiently) with this change:

diff --git a/xfel/small_cell/solve_orientation.py b/xfel/small_cell/solve_orientation.py
index eb87513672..2b677a837d 100644
--- a/xfel/small_cell/solve_orientation.py
+++ b/xfel/small_cell/solve_orientation.py
@@ -49,6 +49,21 @@ class small_cell_orientation:
   print("Summed squared residuals of x,y,z for %d spots in 1/angstroms: %.7f, %.7f, %.7f"%(N,self.residuals[0],self.residuals[1],self.residuals[2]))
   Amatrix = sqr(solution.flatten()).transpose()
 
+  # Recalculate using superpose
+  from scitbx import matrix
+  from scitbx.math import superpose
+  Bmat = matrix.sqr(self.sym.unit_cell().fractionalization_matrix()).transpose()
+  observed_relps = [(0.0, 0.0, 0.0)]
+  predicted_relps = [(0.0, 0.0, 0.0)]
+  for hkl, xyz in zip(self.miller_indices, self.u_vectors):
+      predicted_relps.append(Bmat * hkl)
+      observed_relps.append(xyz)
+  fit = superpose.least_squares_fit(
+      flex.vec3_double(observed_relps), flex.vec3_double(predicted_relps)
+  )
+  Umat = fit.r
+  Amatrix = Umat * Bmat
+
   from cctbx import crystal_orientation
   ori = crystal_orientation.crystal_orientation(Amatrix, crystal_orientation.basis_type.reciprocal)
   return ori

It's not quite the same though. Without looking at it more closely, when it is called here there are 8 reflections in the fit, whereas when it was called at the end of SmallCell.find_crystal_models on the result of the small_cell calculation then there are only 6 reflections.

Here is the output of ModelEvaluation's refinement when the superpose fit is done using code in this diff, and not done again in SmallCell.find_crystal_models:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.041318 | 0.053508 |         0.29512 |
|      1 |      9 | 0.029652 | 0.052035 |         0.29297 |
|      2 |      9 | 0.024172 | 0.047599 |         0.2919  |
|      3 |      9 | 0.019932 | 0.044554 |         0.29188 |
|      4 |      9 | 0.018439 | 0.044178 |         0.29191 |
+--------+--------+----------+----------+-----------------+

Now the start point looks a bit worse in $X$ and $Y$ but better in $\Delta \Psi$.

If I do the superpose-type refinement both in small_cell_orientation and at the end of SmallCell.find_crystal_models, then ModelEvaluation's refinement looks like this:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.033549 | 0.060927 |         0.29802 |
|      1 |      9 | 0.024463 | 0.056772 |         0.29507 |
|      2 |      9 | 0.023265 | 0.049877 |         0.29269 |
|      3 |      9 | 0.020322 | 0.044901 |         0.29202 |
|      4 |      9 | 0.018605 | 0.044141 |         0.29192 |
+--------+--------+----------+----------+-----------------+

Good start point in $X$ and $\Delta \Psi$ but slightly worse in $Y$

These results might be all much of a muchness. It doesn't actually affect the end result - we get the same number of indexed reflections and basically the same crystal model any which way.

dagewa · 2023-08-10T14:27:17Z

The reason the results are different between the approaches is interesting though. As I see it, the numpy.linalg.lstsq fit is a way of discovering the $\mathbf{A}$ matrix mapping from $\vec{h}$ to $\vec{r}$. Then a rotation matrix is extracted from that.

By contrast the superpose method goes from an ideal set of relps to the observed set of relps finding the rotation matrix that minimises the RMSD between the two sets. I thought this used the Kabsch algorithm, but looking at the code I see it apparently uses the Kearsley method by default (but these methods apparently give the same result anyway).

My intuition is that the superpose approach might be superior because we already know the $\mathbf{B}$ matrix so we don't need to discover that part of the mapping. But, again, I have only looked at one case with between 6 and 8 spots so it might be too early to generalise 😀

dwpaley · 2023-08-10T14:56:20Z

Yes, thanks, I should have kept up with this thread better :)

From your 9 indexed reflections and your large-ish Δψ values I think you're looking at the first shot in the ten_cbfs dataset. I would caution that 1-2 of those 9 appear to be bogus. My final orientations appeared better with outlier rejection enabled for candidate refinement:

indexing.stills.indexer=sequences
indexing.refinement_protocol.mode=refine_shells
refinement.parameterisation.crystal.fix=cell
refinement.parameterisation.auto_reduction.min_nref_per_parameter=1
refinement.reflections.outlier.algorithm=tukey
refinement.reflections.outlier.minimum_number_of_reflections=3

Then RMSD_DeltaPsi and RMSD_Y are better by ~half (for the refined subset), but more importantly the integration shoeboxes are in the right places.

We have a lot of progress and I'm putting together a summary, but for now just wanted to mention the outlier issue for this specific image. We're really excited about this here! We are tentatively planning a retirement party for cctbx.xfel.small_cell_process.

dagewa · 2023-08-10T15:11:04Z

Thanks @dwpaley! Yes, I really need to look at the wider picture with more images. I've been stuck in the weeds with the details so far...

dwpaley · 2023-08-10T18:51:10Z

@dagewa Are you ok with 578a4ce? If so, then with the following phil:

dispatch {
  hit_finder {
    minimum_number_of_reflections = 3
    maximum_number_of_reflections = 120
  }
}
input {
  reference_geometry = split_geo/split_0000.expt
  sync_reference_geom = False
}
output {
  experiments_filename = %s_strong.expt
  strong_filename = %s_strong.refl
}
spotfinder {
  lookup {
    mask = mask/psana_border_hot_shift4.mask
  }
  filter {
    min_spot_size = 2
  }
  threshold {
    dispersion {
      kernel_size = 2 2
      global_threshold = 10
    }
  }
}
integration {
  lookup {
    mask = mask/psana_border_hot_shift4.mask
  }
}

indexing.known_symmetry {
  unit_cell=6.0226,7.2623,29.5203,90,90,90
  space_group=C222
}
indexing.stills.indexer=sequences
indexing.refinement_protocol.mode=refine_shells
refinement.parameterisation.crystal.fix=cell
refinement.parameterisation.auto_reduction.min_nref_per_parameter=1
refinement.reflections.outlier.algorithm=tukey
refinement.reflections.outlier.minimum_number_of_reflections=3

indexing.stills {
  set_mosaic_half_deg_value = .05
  set_domain_size_ang_value = 300
}
indexing.method=small_cell

I can process all of the ten_cbfs like this:

$ for f in data/*.cbf; do dials.stills_process params_dsp.phil $f ; done

dagewa · 2023-08-11T09:37:10Z

Hi @dwpaley yes, that looks fine to me at this stage (though in general I think we'd prefer to float the import).

Longer term I think we might want a wider conversation about convergence of workflows and the data model. The fact that we have things like different classes for various mosaicity models is, IMHO, an indication of a problematic design. Honestly I think the base Crystal should have sufficient attributes to describe its physical properties of interest (including mosaicity), and these can be None until we know otherwise. Then the user sets their preferred parameters, which makes a choice of algorithms, and the algorithms themselves modify behaviour based on what attributes are set and their values.

Another target on my long range radar are things like indexing.stills.indexer=sequences. This is super annoying as it tells the user exactly nothing about what this choice does! Even as a developer, I tend to forget the differences between the stills and sequences indexers, so occasionally I try this and then try not-this and just look at output.

Ultimately I'd like the software just to do the Right Thing in each situation, governed by short and directly meaningful user parameters. It's an idealistic goal, but I think getting the feeling that is going on is one reason why people love programs like XDS.

Anyway, that's probably a much longer reply than you expected 😉 (you can tell I'm getting ready to put down tools for my summer break). I put it here because I think the small_cell indexing method has been a good example of collaborative work that brings improvements all round.

dwpaley · 2023-08-11T14:43:50Z

@dagewa Agree with everything here and thanks for the collaboration on this. One of my last points would be to eliminate the phil-juggling here:

dials/src/dials/algorithms/indexing/lattice_search/small_cell.py

Lines 54 to 59 in 578a4ce

    
           self._params.small_cell.powdercell = ( 
        
               target_symmetry_primitive.unit_cell().parameters() 
        
           ) 
        
           self._params.small_cell.spacegroup = str( 
        
               target_symmetry_primitive.space_group_info() 
        
           )

because it fails to pass along any other small_cell params that may have been provided. I tentatively think this means deprecating small_cell.powdercell and small_cell.spacegroup (through a cctbx PR) and instead using the values in indexing.known_symmetry. We'd like to use some slight thoughtfulness about it to avoid breaking our old phil files if possible. (I.e. if params.small_cell.powdercell is not None: assert params.indexing.known_symmetry.unit_cell is None etc).

I know your month is ending in a couple hours; while I haven't discussed it with @phyy-nx, I don't necessarily think the above needs to prevent this PR from getting merged.

dagewa · 2023-08-11T14:53:04Z

Hi @dwpaley agreed sounds sensible. I think this PR shouldn't go in until the xfel import can be replaced with a serialtbx import. Another thing it is missing is some tests. But if those things happen to be completed in the next few weeks in my absence, then I'm totally happy for it to be merged.

dwpaley · 2023-09-12T16:35:59Z

Just wanted to mention that for whatever reason (not yet investigated) this branch provided 4x indexing rate compared to cctbx.small_cell indexing for a small-molecule SSX data set. We continue to be very excited about this :)

ndevenish · 2023-11-30T16:08:44Z

@dagewa do you know what the state of this is?

dagewa · 2023-11-30T18:57:22Z

It is waiting for input from @phyy-nx and @dwpaley to

understand the failure cases compared to the original implementation
move small_cell code to serialtbx, so that this PR won't reintroduce a dependency on xfel

dagewa · 2024-05-15T08:40:12Z

@phyy-nx @dwpaley Where did we get to on this? I remember reaching some kind of conclusion when we met during your visit here, but I don't have notes.

dagewa added 7 commits July 14, 2023 09:19

Skeleton for a small_cell indexer

2b21ff6

Handle None return from small_cell_index_detail

1e9bd4e

Fix horrible subtle bug

4499187

FAO @phyy-nx @dwpaley. If `libtbx.phil` is imported here instead of `iotbx.phil` then cctbx.xfel.small_cell_process will break!

Use small_cell_index_lattice_detail instead of small_cell_index_detai…

f191439

…l, and convert orientation to Crystal

Tidy up

826113b

@phyy-nx @dwpaley this appears to be the actual fix that gets

0c1352c

cctbx.xfel.small_cell_process working with this branch, not the change in 4499187

We need the small_cell scope to exist in the extracted object. This

548815d

is achieved by passing the params as None, as then the Strategy __init__ will parse the phil params. Weird setup, but that's the way it is.

phyy-nx mentioned this pull request Aug 2, 2023

Small cell fixes cctbx/cctbx_project#914

Merged

Merge branch 'main' into small_cell_wrapper

b09f6b2

With sync_reference_geom=False, update the detector in the imageset too.

cdcf78e

Co-authored-by: Aaron Brewster <asbrewster@lbl.gov>

dagewa mentioned this pull request Aug 9, 2023

Selection on id in AssignIndicesStrategy classes leads to surprising behaviour #2485

Open

dagewa added 2 commits August 9, 2023 11:00

Workaround for #2485 to ensure downstream index assignment works prop…

5417d18

…erly

Post small cell, perform an additional refinement of the crystal

3761b4a

orientation by superposing ideal relps over the observed relp positions. This is seen to improve the model going into the refinement done by ModelEvaluation.

make the sequences indexer respect stills.set_domain and .set_mosaic

578a4ce

Merge branch 'main' into small_cell_wrapper

67a7b1c

github-actions bot added the PR: merge conflicts label Feb 14, 2024

Add a small_cell indexer to dials.index #2477

Are you sure you want to change the base?

Add a small_cell indexer to dials.index #2477

Conversation

dagewa commented Jul 28, 2023

Test procedure

phyy-nx commented Aug 1, 2023

phyy-nx commented Aug 2, 2023

dagewa commented Aug 2, 2023

phyy-nx commented Aug 2, 2023

dagewa commented Aug 3, 2023

dagewa commented Aug 3, 2023

phyy-nx commented Aug 3, 2023

dagewa commented Aug 4, 2023

codecov bot commented Aug 4, 2023 • edited

Codecov Report

phyy-nx commented Aug 4, 2023

dagewa commented Aug 7, 2023

dagewa commented Aug 7, 2023

phyy-nx commented Aug 7, 2023

dagewa commented Aug 7, 2023

dagewa commented Aug 8, 2023

dagewa commented Aug 8, 2023

phyy-nx commented Aug 8, 2023

dagewa commented Aug 8, 2023

dagewa commented Aug 8, 2023

phyy-nx commented Aug 8, 2023

dagewa commented Aug 9, 2023

phyy-nx commented Aug 9, 2023

dwpaley commented Aug 9, 2023

dwpaley commented Aug 9, 2023

dagewa commented Aug 10, 2023

dagewa commented Aug 10, 2023

dagewa commented Aug 10, 2023 • edited

dwpaley commented Aug 10, 2023

dagewa commented Aug 10, 2023

dwpaley commented Aug 10, 2023

dagewa commented Aug 11, 2023

dwpaley commented Aug 11, 2023

dagewa commented Aug 11, 2023

dwpaley commented Sep 12, 2023

ndevenish commented Nov 30, 2023

dagewa commented Nov 30, 2023

dagewa commented May 15, 2024

Add a `small_cell` indexer to `dials.index` #2477

Add a `small_cell` indexer to `dials.index` #2477

codecov bot commented Aug 4, 2023 •

edited

dagewa commented Aug 10, 2023 •

edited