Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a small_cell indexer to dials.index #2477

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

dagewa
Copy link
Member

@dagewa dagewa commented Jul 28, 2023

WIP.

Requires cctbx/cctbx_project#912.

Currently, it runs, but indexing fails to assign indices for the cctbx.xfel.small_cell_process test cases. I'm not sure what the problem is.

Test procedure

First run the cctbx.xfel.small_cell_process test

cd $(dials.data get -q 4fluoro_cxi)/lcls_2022_smSFX_workshop_data/indexing/
cctbx.xfel.small_cell_process params_1.phil ../ten_cbfs/*.cbf

Now split the output to get reference geometry for each of the individual images

dials.split_experiments idx-0000_integrated.expt

Then import the first image like this, ensuring the detector geometry and wavelength come from the reference

dials.import ../ten_cbfs/cxily6520_r0164_02081.cbf reference_geometry=split_0.expt

Find spots using the same parameters as in params_1.phil

dials.find_spots imported.expt\
  lookup.mask=../calibration/psana_border_hot_shift4.mask\
  filter.min_spot_size=2\
  kernel_size=2,2\
  global_threshold=10

Now try to index

dials.index imported.expt strong.refl\
  method=small_cell\
  unit_cell=6.0226,7.2623,29.5203,90,90,90\
  space_group=C222

We get some output from small_cell, showing that a cell model has been found, but unfortunately indices can't be assigned :-(

$$$ stills_indexer::choose_best_orientation_matrix, candidate 0 initial outlier identification
$$$ stills_indexer::identify_outliers
Couldn't refine candidate 0, ValueError: Empty reflections table provided to ReflectionManager
No suitable indexing solution found

I'm wondering if @phyy-nx or @dwpaley might know what is wrong here?

FAO @phyy-nx @dwpaley. If `libtbx.phil` is imported here instead of `iotbx.phil`
then cctbx.xfel.small_cell_process will break!
cctbx.xfel.small_cell_process working with this branch, not
the change in 4499187
is achieved by passing the params as None, as then the Strategy __init__
will parse the phil params. Weird setup, but that's the way it is.
@phyy-nx
Copy link
Member

phyy-nx commented Aug 1, 2023

Ok, looking at this today. I found that at the top of the log for the original version it says this:

Spots on d-rings:  12
Total sf spots:    14

But for the dials.index version it says this:

Spots on d-rings:  10
Total sf spots:    14

This means that fewer spots are landing at the right d-spacings. This step happens way before any graph/pair matching and after spotfinding. To me implies there is either something wrong with how the geometry is being passed in or something wrong with the spotfinding. Looking at these options next.

@phyy-nx
Copy link
Member

phyy-nx commented Aug 2, 2023

Ok, I've found the source of the 10 vs. 12 spots on rings, the wavelength. Use use_beam_reference=False for dials.import:

dials.import ../ten_cbfs/cxily6520_r0164_02081.cbf reference_geometry=split_0.expt use_beam_reference=False

The next problem seems to be dials.index mapping the input unit cell to a different setting. If I print out target_symmetry_primitive right in the __init__ function of the SmallCell strategy class, I get this:

crystal.symmetry(
    unit_cell=(4.717327423, 4.717327423, 29.5203, 90, 90, 100.6624023),
    space_group_symbol="C m m m (x-y,x+y,z)"
  )

While this does generate the same list of d-spacings, it does not generate the same list of operators. C222 generates a set of operators where h, k, and l are all in the same order, which this code relies on. However the operators generated by "C m m m (x-y,x+y,z)" include operators with transpositions of h, k, and l, which the above code doesn't consider. This diff re-writes that section to support more complex operators:

   @return list of original indices, list of asymmetric indices
   """
   sym = symmetry(unit_cell=ori.unit_cell(),space_group=phil.small_cell.spacegroup)
-  ops = []
-  for op in sym.space_group().expand_inv(sgtbx.tr_vec((0,0,0))).all_ops(): # this gets the spots related by inversion, aka Bijvoet mates
-    r = op.r().as_hkl()
-    subops = r.split(',')
-    tmpop = [1,1,1]
-    if '-' in subops[0]: tmpop[0] = -1
-    if '-' in subops[1]: tmpop[1] = -1
-    if '-' in subops[2]: tmpop[2] = -1
-
-    if tmpop not in ops:
-      ops.append(tmpop)
+  ops = [op.r() for op in sym.space_group().expand_inv(sgtbx.tr_vec((0,0,0))).all_ops()] # this gets the spots related by inversion, aka Bijvoet mates

   asu_indices = sym.build_miller_set(anomalous_flag=False, d_min = resolution)
   asu_indices_with_dups = []
   original_indicies = []
   for idx in asu_indices.indices():
     for op in ops:
-      orig_idx = (idx[0]*op[0],idx[1]*op[1],idx[2]*op[2])
+      orig_idx = op * idx
       if orig_idx not in original_indicies:
         original_indicies.append(orig_idx)
         asu_indices_with_dups.append(idx)

With that, I get the same graph/connections as I do on the cctbx.xfel.small_cell_process side. My next bug was due to resetting the imageset_id incorrectly, fixed with this diff in small_cell.py:

@@ -428,7 +418,8 @@ def small_cell_index_lattice_detail(experiments, reflections, horiz_phil):
   reflections['s0_proj'] = s0_projs

   from dials.algorithms.indexing.assign_indices import AssignIndicesGlobal
-  reflections['imageset_id'] = reflections['id']
+  if 'imageset_id' not in reflections:
+    reflections['imageset_id'] = reflections['id']
   reflections.centroid_px_to_mm(experiments)
   reflections.map_centroids_to_reciprocal_space(experiments)

With these changes we get a crystal model successfully, but dials.index fails in the refinement step. So the next thing to think about is that dials.index will invoke the indexer or the stills indexer, but neither are appropriate for small cell indexing. They both do complicated refinements. cctbx.xfel.small_cell_process skips them both.

My first thought is to bypass the refinement steps in the indexer or the stills indexer, but I haven't got that working yet.

@dagewa
Copy link
Member Author

dagewa commented Aug 2, 2023

Thanks for the detailed analysis @phyy-nx! I would not have figured all that out so quickly. I guess you are putting the changes mentioned above into small_cell.py so I'll wait for you. Regarding refinement, I wonder if #2456 shows the way, or perhaps setting max_iterations=0?

@phyy-nx
Copy link
Member

phyy-nx commented Aug 2, 2023

Got it! indexing.refinement_protocol.mode=None was part of the answer, but the full command that worked is

dials.index imported.expt strong.refl  method=small_cell  unit_cell=6.0226,7.2623,29.5203,90,90,90  space_group=C222 indexer=sequences indexing.refinement_protocol.mode=None refinement.parameterisation.crystal.fix=cell min_nref_per_parameter=1 reflections.outlier.algorithm=null

This requires cctbx/cctbx_project#914 which has the above code edits.

Hm, some of these extra parameters are the defaults for cctbx.xfel.small_cell_process/dials.stills_process:
https://github.com/dials/dials/blob/main/src/dials/command_line/stills_process.py#L304-L335
https://github.com/cctbx/cctbx_project/blob/master/xfel/small_cell/command_line/small_cell_process.py#L25
Maybe it would be convenient if when using small cell in dials.index that these defaults were automatically configured?

@dagewa
Copy link
Member Author

dagewa commented Aug 3, 2023

Ah, nice one! For reference, this worked for me, with DIALS on the small_cell_wrapper branch and cctbx on the small_cell_fixes branch:

cp -r $(dials.data get -q 4fluoro_cxi)/lcls_2022_smSFX_workshop_data .
cd lcls_2022_smSFX_workshop_data/indexing/
cctbx.xfel.small_cell_process params_1.phil ../ten_cbfs/*.cbf
dials.split_experiments idx-0000_integrated.expt
dials.import ../ten_cbfs/cxily6520_r0164_02081.cbf\
  reference_geometry=split_0.expt use_beam_reference=False
dials.find_spots imported.expt\
  lookup.mask=../calibration/psana_border_hot_shift4.mask\
  filter.min_spot_size=2\
  kernel_size=2,2\
  global_threshold=10
dials.index imported.expt strong.refl method=small_cell\
  unit_cell=6.0226,7.2623,29.5203,90,90,90  space_group=C222\
  indexer=sequences indexing.refinement_protocol.mode=None\
  refinement.parameterisation.crystal.fix=cell\
  min_nref_per_parameter=1 reflections.outlier.algorithm=null

End result looks like a sensible solution:

Detector 1 RMSDs by panel:
+---------+--------+----------+----------+-----------------+
|   Panel |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|      id |        |     (px) |     (px) |           (deg) |
|---------+--------+----------+----------+-----------------|
|       1 |      1 |  0.31042 | 0.3738   |        0.034673 |
|       2 |      2 |  0.21429 | 0.22716  |        0.27764  |
|       5 |      1 |  0.24732 | 0.32019  |        0.16836  |
|       6 |      4 |  0.25749 | 0.83455  |        0.36003  |
|       7 |      1 |  0.11642 | 0.043923 |        0.25426  |
+---------+--------+----------+----------+-----------------+

Refined crystal models:
model 1 (9 reflections):
Crystal:
    Unit cell: 6.023, 7.262, 29.520, 90.000, 90.000, 90.000
    Space group: C 2 2 2
    U matrix:  {{ 0.1011,  0.9932, -0.0571},
                {-0.9217,  0.0719, -0.3811},
                {-0.3744,  0.0912,  0.9228}}
    B matrix:  {{ 0.1660,  0.0000,  0.0000},
                { 0.0000,  0.1377,  0.0000},
                { 0.0000,  0.0000,  0.0339}}
    A = UB:    {{ 0.0168,  0.1368, -0.0019},
                {-0.1530,  0.0099, -0.0129},
                {-0.0622,  0.0126,  0.0313}}
+------------+-------------+---------------+-------------+
|   Imageset |   # indexed |   # unindexed | % indexed   |
|------------+-------------+---------------+-------------|
|          0 |           9 |             5 | 64.3%       |
+------------+-------------+---------------+-------------+

@dagewa
Copy link
Member Author

dagewa commented Aug 3, 2023

I'm thinking about the parameter defaults. I have an ulterior motive here, which is to try small_cell on ED stills, potentially with larger unit cells than the example here. I'd like to find some defaults or automatic configuration that works appropriately across the various use cases. In general, I don't like min_nref_per_parameter=1. I don't really trust the results of refinement when the observation to parameter ratio is small, especially when outlier rejection is switched off. I'd rather automatically fix more parameters. After fixing the cell (which I certainly agree is appropriate here) I think the detector tilt/twist (Tau2 and Tau3) should be fixed as they are poorly determined by the data.

@phyy-nx
Copy link
Member

phyy-nx commented Aug 3, 2023

Oh that's a good point. In fact, dials.stills_process fixes the detector even for protein still images. I'd fix the wavelength, cell, and detector, and only refine the orientation (3 parameters total). For the above image, that would be 3 reflections per parameter.

That said, for X-ray small molecule, I still think min_nref_per_parameter=1 is needed. Not only can the small cell algorithm index from 3 spots, but in fact we have histogrammed the number of indexed spots per image that went into one of our XFEL datasets and the mode was 3 spots per image!

@dagewa
Copy link
Member Author

dagewa commented Aug 4, 2023

I was wondering about how the small_cell lattice search method could override general refinement parameters for indexing. But actually I think it wouldn't be too nice to allow that. I think it's a job for the higher level caller (the dials.index user, or a pipeline like dials.stills_process).

One possibility though is that we instead just include a step of orientation-only refinement as part of the find_crystal_models call, that effectively sets those defaults. I guess that should be enough for the reflections that were indexed as part of the small_cell algorithm to again have their indices assigned when the AssignIndicesStrategy is used. Then the user can choose whether to do further refinement with whatever parameters they think are appropriate, or just set indexing.refinement_protocol.mode=None (or "repredict_only").

Another issue I'm thinking about is that the use case of indexing a single still pattern while refining detector parameters is indeed not very sensible, but it could be reasonable to refine detector parameters against a set of indexed cells. I suppose that is more a job for dials.refine though.

@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Attention: 39 lines in your changes are missing coverage. Please review.

Comparison is base (6ec457a) 78.52% compared to head (67a7b1c) 78.43%.
Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2477      +/-   ##
==========================================
- Coverage   78.52%   78.43%   -0.09%     
==========================================
  Files         609      610       +1     
  Lines       75096    75145      +49     
  Branches    10723    10277     -446     
==========================================
- Hits        58971    58943      -28     
- Misses      13953    14024      +71     
- Partials     2172     2178       +6     

@phyy-nx
Copy link
Member

phyy-nx commented Aug 4, 2023

I was wondering about how the small_cell lattice search method could override general refinement parameters for indexing. But actually I think it wouldn't be too nice to allow that. I think it's a job for the higher level caller (the dials.index user, or a pipeline like dials.stills_process).

Agreed

One possibility though is that we instead just include a step of orientation-only refinement as part of the find_crystal_models call, that effectively sets those defaults. I guess that should be enough for the reflections that were indexed as part of the small_cell algorithm to again have their indices assigned when the AssignIndicesStrategy is used. Then the user can choose whether to do further refinement with whatever parameters they think are appropriate, or just set indexing.refinement_protocol.mode=None (or "repredict_only").

That sounds doable. One trick is that the reason I had to set refinement parameters (refinement.parameterisation.crystal.fix=cell min_nref_per_parameter=1 reflections.outlier.algorithm=null) even though refinement should have been off (indexing.refinement_protocol.mode=None), is because there is still a refinement done by the ModelEvaluation object. That would need to be controlled somehow too.

Another issue I'm thinking about is that the use case of indexing a single still pattern while refining detector parameters is indeed not very sensible, but it could be reasonable to refine detector parameters against a set of indexed cells. I suppose that is more a job for dials.refine though.

Agreed

@dagewa
Copy link
Member Author

dagewa commented Aug 7, 2023

there is still a refinement done by the ModelEvaluation object. That would need to be controlled somehow too.

Ah yes, thanks! I was looking for that, but couldn't find it. It looks like this refinement should have a fallback mode when there are very few reflections, like in this case. At the moment it just gives up with the solution. The simplest approach may be to have another try here:

except (RuntimeError, ValueError):
return

before finally giving up only in the case that small_cell-appropriate refinement does not succeed. I'll take a look at this now.

@dagewa
Copy link
Member Author

dagewa commented Aug 7, 2023

Just a note about adding a step of orientation-only refinement at the end of small_cell - I remember that LowResSpotMatch does something like this too. However, it doesn't use DIALS refinement, it calculates the best-fit $\mathbf{U}$ matrix using scitbx.math.superpose, by comparing the observed relps with the reference relps calculated using the known cell.

@staticmethod
def _fit_U_from_superposed_points(reference, other):
# Add the origin to both sets of points
reference.append((0, 0, 0))
other.append((0, 0, 0))
# Find U matrix that takes ideal relps to the reference
fit = superpose.least_squares_fit(reference, other)
return fit.r

If I can remember the details I'll try to add this too. That way you get a better $\mathbf{U}$ matrix going into index assignment, so more indexed spots available for the refinement done in ModelEvaluation

@phyy-nx
Copy link
Member

phyy-nx commented Aug 7, 2023

Hi, I think small_cell currently does something similar to the above code?
https://github.com/cctbx/cctbx_project/blob/master/xfel/small_cell/solve_orientation.py#L40

@dagewa
Copy link
Member Author

dagewa commented Aug 7, 2023

Yes, probably similar. I'm trying to work out why I'm the test example the small_cell algorithm ends with 6 indexed spots, but when the crystal model gets passed to AssignIndices only 3 actually get indexed

Co-authored-by: Aaron Brewster <asbrewster@lbl.gov>
@dagewa
Copy link
Member Author

dagewa commented Aug 8, 2023

Ok, I think I've worked out the weirdness. If you add these lines to the end of the SmallCell.find_crystal_models method:

        from dials.algorithms.indexing.assign_indices import AssignIndicesGlobal
        experiments[0].crystal=crystal
        AssignIndicesGlobal(tolerance=0.3)(reflections, experiments)
        print(list(reflections["id"]))
        AssignIndicesGlobal(tolerance=0.3)(reflections, experiments)
        print(list(reflections["id"]))

then we are doing the index assignment like dials.index will do, but we do it twice. The results are a bit surprising:

[-1, -1, -1, -1, -1, -1, -1, 0, -1, -1, -1, -1, 0, 0]
[0, -1, 0, -1, 0, -1, 0, -1, -1, 0, -1, 0, -1, -1]

The first time round, only 3 reflections are indexed, which is exactly what I've been seeing in dials.index. The second time we get 6 reflections indexed!

The problem is that AssignIndicesGlobal makes a selection based on id:

sel = inside_resolution_limit & (reflections["id"] == -1)

But small_cell_index_lattice_detail has already altered the reflections["id"], so we're not indexing a clean reflection list.

If instead of the above edit I reset the ids to -1:

        reflections["id"] *= 0
        reflections["id"] -= 1

then run dials.index, then I see 9 reflections indexed, not 3!

...
Working set:  11 12 0 2 9 4 6 7 
Indexed set:  0 2 4 6 9 11 
Done refining unit cell.  No new spots to add.
Candidate solutions:
+---------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------+
| unit_cell                       |   volume |   volume score |   #indexed |   % indexed |   % indexed score |   rmsd_xy |   rmsd_xy score |   overall score |
|---------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------|
| 4.72 4.72 29.52 90.0 90.0 100.7 |      646 |              0 |          9 |          64 |                 0 |      0.05 |               0 |               0 |
+---------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------+

@dagewa
Copy link
Member Author

dagewa commented Aug 8, 2023

I haven't quite decided what to do about this yet. We could copy the reflection list before calling small_cell_index_lattice_detail, to avoid the side-effects. However I'm not totally convinced that AssignIndicesGlobal is doing the right thing anyway. It resets all Miller indices to 0,0,0 but then apparently only attempts to index reflections with id == -1. So it seems to "unindex" already indexed reflections.

@phyy-nx
Copy link
Member

phyy-nx commented Aug 8, 2023

Hi, just chiming in, it looks like you are chasing the same leads we are. dials.index uses several rounds of AssignIndiciesGlobal, the first with a tolerance of 0.1, and the rest with 0.3. That indexes 9 reflections. But cctbx.xfel.small_cell_process only does the one round of AssignIndicesGlobal, with a tolerance of 0.1. So it only indexes 5 reflections. Looking at the extra 4 reflections in the image viewer and the predictions look weird, but that might be because of other bugs. But, importantly, we reduced the tolerance to 0.1 for small cell on purpose, from 0.3, in cctbx/cctbx_project@df85d76

From @dwpaley: we should be measuring the tolerance in reciprocal angstroms, but we use fractional coordinates. For small cells, this actually puts us closer to the volumes checked for protein data. For this image there could be multiple crystals in the shot, so it's impossible to know which tolerance is better. A tighter tolerance might be more realistic.

@dagewa
Copy link
Member Author

dagewa commented Aug 8, 2023

Hi Aaron, away from the computer now, but just wanted to add that I'm not worried about the tolerance but the non-idempotent behaviour of the index assignment. Looks like there are bugs to me

@dagewa
Copy link
Member Author

dagewa commented Aug 8, 2023

Indeed if you keep calling the index assignment it ends up toggling between two outputs!

@phyy-nx
Copy link
Member

phyy-nx commented Aug 8, 2023

Ha, wow. Definitely a bug. Digging deeper with @dwpaley.

orientation by superposing ideal relps over the observed relp
positions. This is seen to improve the model going into the
refinement done by ModelEvaluation.
@dagewa
Copy link
Member Author

dagewa commented Aug 9, 2023

@phyy-nx @dwpaley I've put in a workaround for #2485, but it may be something we can remove if that issue is fixed within AssignIndices.

Also, I wanted to bring 3761b4a to your attention. To test this I had a look at the output of refinement done by ModelEvaluation. Without this additional post-small_cell $\mathbf{U}$ matrix polishing I got this output:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.036485 | 0.053279 |         0.31702 |
|      1 |      9 | 0.026735 | 0.053942 |         0.30935 |
|      2 |      9 | 0.024164 | 0.048755 |         0.29908 |
|      3 |      9 | 0.020731 | 0.044542 |         0.29311 |
|      4 |      9 | 0.018689 | 0.044066 |         0.29199 |
+--------+--------+----------+----------+-----------------+

With the additional polishing step I saw this:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.03447  | 0.055502 |         0.30801 |
|      1 |      9 | 0.025447 | 0.054495 |         0.30319 |
|      2 |      9 | 0.02375  | 0.048833 |         0.29666 |
|      3 |      9 | 0.020577 | 0.04458  |         0.29276 |
|      4 |      9 | 0.018656 | 0.044083 |         0.29197 |
+--------+--------+----------+----------+-----------------+

The models end up at basically the same place, but including the superpose refinement gives a better start point, at least in $X$ and $\Delta \Psi$.

I have not tested this any more extensively than the single image example in #2477 (comment), but if it is something that proves generally useful then perhaps it should move from DIALS indexing's SmallCell to small_cell.py-proper.

@phyy-nx
Copy link
Member

phyy-nx commented Aug 9, 2023

Hi, I think what you are showing is that there is a improvement when using scitbx.math.superpose.least_squares_fit instead of np.linalg.lstsq, since it's essentially the same inputs to both functions (hkl and xyz). In fact, I think you could replace np.linalg.lstsq in solve_orientation.py with superpose.least_squares_fit and reproduce your results. Could you try that?

@dwpaley
Copy link
Contributor

dwpaley commented Aug 9, 2023

@dagewa There's something weird going on here or thereabouts:

sel = self.reflections["id"] == -1

Do you understand this code? Currently this is selecting the unindexed reflections and attempting to index them. If we don't index any of those, or if there were no unindexed reflections in the first place, then choose_best_orientation_matrix will return None,None. That can't possibly be right, right? It kills my small_cell indexing for 5 out of 10 shots in the ten_cbfs dataset. Any thoughts? Changing == to != on the given line (i.e. repeat indexing on the already-indexed reflections) makes the code run, but I'm wary because I don't know what this was supposed to do.

@dwpaley
Copy link
Contributor

dwpaley commented Aug 9, 2023

Ok, @phyy-nx and I figured this out. The small_cell lattice search was breaking convention by marking reflections as indexed. If we reset reflections['id'] to -1 (and reflections['miller_index'] to (0,0,0)) before leaving SmallCell.find_crystal_models, then this problem goes away.

@dagewa
Copy link
Member Author

dagewa commented Aug 10, 2023

Hi @dwpaley I think you're describing what I also found and fixed in 5417d18, right?

@dagewa
Copy link
Member Author

dagewa commented Aug 10, 2023

Hi, I think what you are showing is that there is a improvement when using scitbx.math.superpose.least_squares_fit instead of np.linalg.lstsq, since it's essentially the same inputs to both functions (hkl and xyz). In fact, I think you could replace np.linalg.lstsq in solve_orientation.py with superpose.least_squares_fit and reproduce your results. Could you try that?

As a test I did that (inefficiently) with this change:

diff --git a/xfel/small_cell/solve_orientation.py b/xfel/small_cell/solve_orientation.py
index eb87513672..2b677a837d 100644
--- a/xfel/small_cell/solve_orientation.py
+++ b/xfel/small_cell/solve_orientation.py
@@ -49,6 +49,21 @@ class small_cell_orientation:
   print("Summed squared residuals of x,y,z for %d spots in 1/angstroms: %.7f, %.7f, %.7f"%(N,self.residuals[0],self.residuals[1],self.residuals[2]))
   Amatrix = sqr(solution.flatten()).transpose()
 
+  # Recalculate using superpose
+  from scitbx import matrix
+  from scitbx.math import superpose
+  Bmat = matrix.sqr(self.sym.unit_cell().fractionalization_matrix()).transpose()
+  observed_relps = [(0.0, 0.0, 0.0)]
+  predicted_relps = [(0.0, 0.0, 0.0)]
+  for hkl, xyz in zip(self.miller_indices, self.u_vectors):
+      predicted_relps.append(Bmat * hkl)
+      observed_relps.append(xyz)
+  fit = superpose.least_squares_fit(
+      flex.vec3_double(observed_relps), flex.vec3_double(predicted_relps)
+  )
+  Umat = fit.r
+  Amatrix = Umat * Bmat
+
   from cctbx import crystal_orientation
   ori = crystal_orientation.crystal_orientation(Amatrix, crystal_orientation.basis_type.reciprocal)
   return ori

It's not quite the same though. Without looking at it more closely, when it is called here there are 8 reflections in the fit, whereas when it was called at the end of SmallCell.find_crystal_models on the result of the small_cell calculation then there are only 6 reflections.

Here is the output of ModelEvaluation's refinement when the superpose fit is done using code in this diff, and not done again in SmallCell.find_crystal_models:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.041318 | 0.053508 |         0.29512 |
|      1 |      9 | 0.029652 | 0.052035 |         0.29297 |
|      2 |      9 | 0.024172 | 0.047599 |         0.2919  |
|      3 |      9 | 0.019932 | 0.044554 |         0.29188 |
|      4 |      9 | 0.018439 | 0.044178 |         0.29191 |
+--------+--------+----------+----------+-----------------+

Now the start point looks a bit worse in $X$ and $Y$ but better in $\Delta \Psi$.

If I do the superpose-type refinement both in small_cell_orientation and at the end of SmallCell.find_crystal_models, then ModelEvaluation's refinement looks like this:

Refinement steps:
+--------+--------+----------+----------+-----------------+
|   Step |   Nref |   RMSD_X |   RMSD_Y |   RMSD_DeltaPsi |
|        |        |     (mm) |     (mm) |           (deg) |
|--------+--------+----------+----------+-----------------|
|      0 |      9 | 0.033549 | 0.060927 |         0.29802 |
|      1 |      9 | 0.024463 | 0.056772 |         0.29507 |
|      2 |      9 | 0.023265 | 0.049877 |         0.29269 |
|      3 |      9 | 0.020322 | 0.044901 |         0.29202 |
|      4 |      9 | 0.018605 | 0.044141 |         0.29192 |
+--------+--------+----------+----------+-----------------+

Good start point in $X$ and $\Delta \Psi$ but slightly worse in $Y$

These results might be all much of a muchness. It doesn't actually affect the end result - we get the same number of indexed reflections and basically the same crystal model any which way.

@dagewa
Copy link
Member Author

dagewa commented Aug 10, 2023

The reason the results are different between the approaches is interesting though. As I see it, the numpy.linalg.lstsq fit is a way of discovering the $\mathbf{A}$ matrix mapping from $\vec{h}$ to $\vec{r}$. Then a rotation matrix is extracted from that.

By contrast the superpose method goes from an ideal set of relps to the observed set of relps finding the rotation matrix that minimises the RMSD between the two sets. I thought this used the Kabsch algorithm, but looking at the code I see it apparently uses the Kearsley method by default (but these methods apparently give the same result anyway).

My intuition is that the superpose approach might be superior because we already know the $\mathbf{B}$ matrix so we don't need to discover that part of the mapping. But, again, I have only looked at one case with between 6 and 8 spots so it might be too early to generalise 😀

@dwpaley
Copy link
Contributor

dwpaley commented Aug 10, 2023

Yes, thanks, I should have kept up with this thread better :)

From your 9 indexed reflections and your large-ish Δψ values I think you're looking at the first shot in the ten_cbfs dataset. I would caution that 1-2 of those 9 appear to be bogus. My final orientations appeared better with outlier rejection enabled for candidate refinement:

indexing.stills.indexer=sequences
indexing.refinement_protocol.mode=refine_shells
refinement.parameterisation.crystal.fix=cell
refinement.parameterisation.auto_reduction.min_nref_per_parameter=1
refinement.reflections.outlier.algorithm=tukey
refinement.reflections.outlier.minimum_number_of_reflections=3

Then RMSD_DeltaPsi and RMSD_Y are better by ~half (for the refined subset), but more importantly the integration shoeboxes are in the right places.

We have a lot of progress and I'm putting together a summary, but for now just wanted to mention the outlier issue for this specific image. We're really excited about this here! We are tentatively planning a retirement party for cctbx.xfel.small_cell_process.

@dagewa
Copy link
Member Author

dagewa commented Aug 10, 2023

Thanks @dwpaley! Yes, I really need to look at the wider picture with more images. I've been stuck in the weeds with the details so far...

@dwpaley
Copy link
Contributor

dwpaley commented Aug 10, 2023

@dagewa Are you ok with 578a4ce? If so, then with the following phil:

dispatch {
  hit_finder {
    minimum_number_of_reflections = 3
    maximum_number_of_reflections = 120
  }
}
input {
  reference_geometry = split_geo/split_0000.expt
  sync_reference_geom = False
}
output {
  experiments_filename = %s_strong.expt
  strong_filename = %s_strong.refl
}
spotfinder {
  lookup {
    mask = mask/psana_border_hot_shift4.mask
  }
  filter {
    min_spot_size = 2
  }
  threshold {
    dispersion {
      kernel_size = 2 2
      global_threshold = 10
    }
  }
}
integration {
  lookup {
    mask = mask/psana_border_hot_shift4.mask
  }
}

indexing.known_symmetry {
  unit_cell=6.0226,7.2623,29.5203,90,90,90
  space_group=C222
}
indexing.stills.indexer=sequences
indexing.refinement_protocol.mode=refine_shells
refinement.parameterisation.crystal.fix=cell
refinement.parameterisation.auto_reduction.min_nref_per_parameter=1
refinement.reflections.outlier.algorithm=tukey
refinement.reflections.outlier.minimum_number_of_reflections=3

indexing.stills {
  set_mosaic_half_deg_value = .05
  set_domain_size_ang_value = 300
}
indexing.method=small_cell

I can process all of the ten_cbfs like this:

$ for f in data/*.cbf; do dials.stills_process params_dsp.phil $f ; done

@dagewa
Copy link
Member Author

dagewa commented Aug 11, 2023

Hi @dwpaley yes, that looks fine to me at this stage (though in general I think we'd prefer to float the import).

Longer term I think we might want a wider conversation about convergence of workflows and the data model. The fact that we have things like different classes for various mosaicity models is, IMHO, an indication of a problematic design. Honestly I think the base Crystal should have sufficient attributes to describe its physical properties of interest (including mosaicity), and these can be None until we know otherwise. Then the user sets their preferred parameters, which makes a choice of algorithms, and the algorithms themselves modify behaviour based on what attributes are set and their values.

Another target on my long range radar are things like indexing.stills.indexer=sequences. This is super annoying as it tells the user exactly nothing about what this choice does! Even as a developer, I tend to forget the differences between the stills and sequences indexers, so occasionally I try this and then try not-this and just look at output.

Ultimately I'd like the software just to do the Right Thing in each situation, governed by short and directly meaningful user parameters. It's an idealistic goal, but I think getting the feeling that is going on is one reason why people love programs like XDS.

Anyway, that's probably a much longer reply than you expected 😉 (you can tell I'm getting ready to put down tools for my summer break). I put it here because I think the small_cell indexing method has been a good example of collaborative work that brings improvements all round.

@dwpaley
Copy link
Contributor

dwpaley commented Aug 11, 2023

@dagewa Agree with everything here and thanks for the collaboration on this. One of my last points would be to eliminate the phil-juggling here:

self._params.small_cell.powdercell = (
target_symmetry_primitive.unit_cell().parameters()
)
self._params.small_cell.spacegroup = str(
target_symmetry_primitive.space_group_info()
)

because it fails to pass along any other small_cell params that may have been provided. I tentatively think this means deprecating small_cell.powdercell and small_cell.spacegroup (through a cctbx PR) and instead using the values in indexing.known_symmetry. We'd like to use some slight thoughtfulness about it to avoid breaking our old phil files if possible. (I.e. if params.small_cell.powdercell is not None: assert params.indexing.known_symmetry.unit_cell is None etc).

I know your month is ending in a couple hours; while I haven't discussed it with @phyy-nx, I don't necessarily think the above needs to prevent this PR from getting merged.

@dagewa
Copy link
Member Author

dagewa commented Aug 11, 2023

Hi @dwpaley agreed sounds sensible. I think this PR shouldn't go in until the xfel import can be replaced with a serialtbx import. Another thing it is missing is some tests. But if those things happen to be completed in the next few weeks in my absence, then I'm totally happy for it to be merged.

@dwpaley
Copy link
Contributor

dwpaley commented Sep 12, 2023

Just wanted to mention that for whatever reason (not yet investigated) this branch provided 4x indexing rate compared to cctbx.small_cell indexing for a small-molecule SSX data set. We continue to be very excited about this :)

@ndevenish
Copy link
Member

@dagewa do you know what the state of this is?

@dagewa
Copy link
Member Author

dagewa commented Nov 30, 2023

It is waiting for input from @phyy-nx and @dwpaley to

  • understand the failure cases compared to the original implementation
  • move small_cell code to serialtbx, so that this PR won't reintroduce a dependency on xfel

@dagewa
Copy link
Member Author

dagewa commented May 15, 2024

@phyy-nx @dwpaley Where did we get to on this? I remember reaching some kind of conclusion when we met during your visit here, but I don't have notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants