Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: refactor Abins, replace SData with Euphonic data structures #37350

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

ajjackson
Copy link
Contributor

@ajjackson ajjackson commented May 13, 2024

Description of work

Summary of work

Remove abins.sdata.SData, use classes from euphonic.spectra instead

Purpose of work

Abins currently manages collections of simulated spectra with an "SData" class. This assumes that the data will be structured in a tree with heirarchy atom -> quantum-order. This is not flexible enough to deal with coherent scattering, which might not be decomposable by atom.

Euphonic is an existing dependency and uses Spectrum1D, Spectrum2D and Spectrum1DCollection classes to manage spectra including arbitrary metadata. Effectively this is a flat table structure, amenable to "query" or "pipeline" workflows. This seems to be an appropriate tool, so here we extend that to add the missing Spectrum2DCollection and work towards using those natively in Abins.

Further detail of work

  • Add a method to SData to produce an equivalent SpectrumNDCollection object
  • Update abins to use these objects for all "output" management (i.e. summing relevant subsets and creating workspaces)
  • Update abins to simulate data directly into a SpectrumNDCollection object
  • Delete SData class and related unit tests
  • Move patches and extensions to Euphonic classes upstream into a new euphonic release
  • Remove Euphonic subclasses from mantid, update Euphonic dependency version

To test:


Reviewer

Please comment on the points listed below (full description).
Your comments will be used as part of the gatekeeper process, so please comment clearly on what you have checked during your review. If changes are made to the PR during the review process then your final comment will be the most important for gatekeepers. In this comment you should make it clear why any earlier review is still valid, or confirm that all requested changes have been addressed.

Code Review

  • Is the code of an acceptable quality?
  • Does the code conform to the coding standards?
  • Are the unit tests small and test the class in isolation?
  • If there is GUI work does it follow the GUI standards?
  • If there are changes in the release notes then do they describe the changes appropriately?
  • Do the release notes conform to the release notes guide?

Functional Tests

  • Do changes function as described? Add comments below that describe the tests performed?
  • Do the changes handle unexpected situations, e.g. bad input?
  • Has the relevant (user and developer) documentation been added/updated?

Does everything look good? Mark the review as Approve. A member of @mantidproject/gatekeepers will take care of it.

Gatekeeper

If you need to request changes to a PR then please add a comment and set the review status to "Request changes". This will stop the PR from showing up in the list for other gatekeepers.

Abins unit tests of calculation over a few quantum orders were set up
incorrectly and instead calculate the largest case three
times. Correct the "default" value so this unit test provides more
meaningful coverage over logic paths.

This does not raise any accuracy concerns; calculations with various
number of quantum events are also covered by integration tests and
they should be unaffected.

The timing impact is pretty negligible on main(), but makes a big
difference on a WIP branch where (currently!) autoconvolution is very
slow.
To begin migration from SData to euphonic data types, allow existing
SData to produce a Spectrum1DCollection. Then we can start
modifying code that _uses_ SData to use this datatype instead.

So far we modify the _atom_number_s method which creates output
workspaces from the Abins calculation.

Spectrum2DCollection is not implemented yet, so initially we leave the
old implementation in place for 2D cases.
- Although Spectrum1DCollection provides a .select() function, we
  ended up using a chain of filters here because it can't handle
  numerical comparisons. Perhaps this would be a bit cleaner if
  the select() method accepted our filter functions.
- The Spectrum1DCollection group_by() method throws an error if the
  key is in the top level of metadata (i.e. because it applies to all
  rows.) It would be good to clean that upstream too.
This is not very thoroughly tested here, and should be moved upstream
once everything looks happy on Abins side.

- Wrap Spectrum1DCollection to deal with group_by() issue where
  metadata is the same for all items (and so not in line_data)
- Implement Spectrum2DCollection
- Have SData.get_spectrum_collection() dispatch to 1D or 2D as
  appropriate
- Tweak workspace writers to use either SpectrumNCollection class
As the relevant information is available in spectra, this allows us to
drop a redundant parameter and reduce the number of objects passed around
- Now the Algorithm classes never deal with SData at all
- SData still used internally and cached
Move the threshold-check logic to module-level functions, and leave
existing method as a wrapper. Then we should be able to delete the
class later...

Also remove some useless parameters/logic that make the return value
optional. We can just ignore the return value if we don't care about
it.
Bin checks on Spectrum2DCollection need adapting to interpret 3-D
z_data array correctly. This implementation is a bit inefficient as it
needlessly instantiates a Spectrum2D; leave more verbose/efficient
implementation for upstream.
This functionality is also available in Euphonic, and it would make
sense to remove this entirely from Abins and use it there.

However
- this will need a bit of tinkering to work with Spectrum2DCollection,
  and doing it upstream would avoid a lot of duplication
- it would be better to apply kinematic constraints (and instrument
  broadening) after spectra have been filtered and summed into output
  groups

So let's defer that for now
- Push SData a bit deeper in the chain of functions

- Stop using SDataByAngle, instead use angle metadata in
  SpectrumCollection objects

- Patch SpectrumCollection so that group_by() and select() work as
  expected (i.e. run succesfully) when some keys are at top level of
  metadata (i.e. common to all lines).

- Apply broadening to SpectrumCollection objects. For now we keep
  using Abins implementation of broadening so that tests pass; this
  should be replaced with Euphonic implementation later.
This is a significant step towards removing SData entirely.

For compatibility at this point, Spectrum objects mostly store the bin
centres rather than bin edges. When SData is gone it would nicer to
move to storing bin edges. (Centres can be computed from edges, but
edges can only be guessed from centres...)
Officially the Spectrum metadata should not include float because they
can't be reliably compared for equality
Continue dissolving SData: apply DW to output spectra, drop now-unused
method from class
- create new autoconvolution routine that works with spectrum
  collection

- integrate downsampling into autoconvolution loop over atoms

  - this could save a bit of memory, or benefit from parallism over
    atoms

  - not clear if benefit is large at this point, but it doesn't add
    much complexity either. YAGNI?

- apply q2 order corrections, converting 1D initial spectra to 2D maps
  for quantum orders with isotropic Debye-Waller correction (i.e. all
  orders above fundamental)

- drop _broaden_sdata as this is no longer needed
The Spectrum1DCollection/Spectrum2DCollection classes allow arbitrary
metadata, so there is no need for a technical distinction between the
class that collects over atoms and quantum orders and one that
collects over angles.
For now, fallback via SData if no isotroptic calculation
This test is running very slowly due to extra autoconvolution
calculations, which are not required by the equivalent tests in
AbinsBasicTest. The reason is that the default value of
QuantumOrderEventsNumber has changed.

This was a useful indication that the autoconvolution needs more
optimisation in the ongoing SData->SpectrumCollection refactor, but
still not a lot of fun to figure out! Due to caching, the first of
these three tests to run is slow and then the other two are fast...
- Rather than go through the data transformation on arrays of zeros,
  we can return an almost-empty SpectrumCollection immediately.
- That finally eliminates SData from this module! There is definitely
  room for optimisation as autoconvolution has become more expensive,
  but the order-1 stuff seems to be running in similar time. It would
  be a good idea to do some profiling later...
Finally, the cathartic part of the refactor where I get to delete 500
lines of code at once
This was breaking a doctest
Remove innermost loop of Spectrum creation by assembling y_data array
directly.
The .from_spectra() constructor has a lot of overhead as it checks the
types and numerical agreement of axes; if we are comparing Collections
this should only be checked once.

If this tweak works well it should move upstream to euphonic. Here we
refer to a private method on the 1D collection from the 2D collection:
really this can live on a parent class.
Not seeing a big impact on test timings here, but this _should_ scale
better as we add more atoms.
- The euphonic SpectumCollection select() method doesn't work when all
  lines in the collection match the criterion, because such metadata is
  moved into the top-level metadata rather than the per-line section
  which is checked. The key is therefore treated as missing.

- Compounding this, the data container cannot be entirely
  empty, so it allows an error to be raised if no matches are found.

- More reasonably, the `from_spectra()` constructor cannot operate on
  an empty list, as this means no (common) axis information woudl be
  available. In this commit we avoid calling this when no data is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant