expose some functions under bruker.api #121

sem-geologist · 2023-05-19T14:51:42Z

Description of the change

This PR aims to modernise and extend the bruker._api.py:

expose middle layer functions and classes under bruker.api
review, update and extend the docstrings, in particular of these functions and class'es which or which child are
going to be exposed through .api
review the code for possible cleanup, scope-separation and streamlining for public exposition in .api

Progress of the PR

expose SfsReader
expose xml_to_spectrum
expose xml_to_image
update docstring (if appropriate),
update user guide (if appropriate),
add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
add tests,
ready for review.

???
are xml_to_spectrum and xml_to_image good naming for these functions. functions require particular etree nodes - maybe et_node_to_spectrum and et_node_to_image would be better naming?

Minimal example of the bug fix or the new feature

from sciio.bruker import api as b_api

b_api.SFSReader('somefile.pan')  # *.pan are particle analysis files using same container as bcf

to follow python class naming convention

codecov · 2023-05-19T14:57:35Z

Codecov Report

Patch coverage: 96.62% and project coverage change: +0.13 🎉

Comparison is base (5f9e746) 85.16% compared to head (a57fd72) 85.29%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   85.16%   85.29%   +0.13%     
==========================================
  Files          73       74       +1     
  Lines        9030     9042      +12     
  Branches     1932     2045     +113     
==========================================
+ Hits         7690     7712      +22     
+ Misses        873      870       -3     
+ Partials      467      460       -7

Impacted Files	Coverage Δ
rsciio/bruker/_api.py	`88.74% <96.59%> (+0.95%)`	⬆️
rsciio/bruker/api.py	`100.00% <100.00%> (ø)`

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

rsciio/bruker/_api.py

SfsReader->SFSReader BCF_reader -> BCFReader update docstrings

sem-geologist · 2023-05-29T14:37:52Z

@ericpre , @jlaehne I have some question. As I am doing pretty extensive review of bruker _api.py, so that it could be more streamlined, I came again over idea how to handle images. Bruker xml image node can have plains (that is called "plains" within the XML), it is basically different video channels, which register signal at same time/simultaniously when beam is rastered over sample ROI (In some sense it is very similar to spectroscopy to be honest). So basically most of metadata is the same (actually in XML it is physically shared) except description string of the every channel. Hitherto in this case all metadata was being duplicated, and every plain was returned as separate independent set with its data and metadata. But... is it the right and most efficient way? would it not be better way to stack those plains, alongside axis called as "channels"? can axes have labeled scale (that is instead of numbers it would be channel description)? Does this makes any sense at all? I mean for me it does a lot, as those plains are different representations of exactly same material interaction or result of beam: i.e. say as example 4 channels: BF, DF, DF4, HAADF. They all represent different interacted electrons with matter.
Even if taking 2 channel (most common setup with SEM having Bruker EDS) of BSE and SE images, those represents different energy electrons, and in some sense combining it forms kind of spectral image.

Such change probably is going break some established workflows. But I think it would be very right thing to do.

jlaehne · 2023-05-29T17:14:29Z

We had a similar topic brought up for LumiSpy (CL spectral image + corresponding SE image) by @jordiferrero, which was never implemented so far: LumiSpy/lumispy#73
Indeed, it would be nice to have an upstream solution for that in HyperSpy!

Concerning the 'labeled' axes for stacked data, @CSSFrancis was working on a PR.

CSSFrancis · 2023-05-29T17:39:29Z

So adding labeled axes wouldn't be terribly hard in hyperspy. I have done most of the work in hyperspy/hyperspy#3031 Right now there is a little bit of hesitation with this change but I think that there isn't really anything stopping this from happening. This might be something worth voting on or having a larger discussion about how these signals are handled.

Handling multiple signals is a little bit tricky and becomes more tricky if they have different numbers of pixels, but even that can be handled by hyperspy.

There are a couple of things:

If all of the signals have the same dimensionality and same size --> pass as labeled axes and create a stacked array
If all of the signals have the same dimensionality and different --> pass as labeled axes and create a stacked ragged array
If one of the signals has a higher dimensionality (i.e. HAADF and 4-D STEM) pass one as navigator and one as array

For 3 I also think that in the case where you have multiple different signals you should be able to pass all of them. In that case you could have multiple different navigators. For example with a 4-D STEM you could have brightfield and darkfield images.

sem-geologist · 2023-05-29T21:24:09Z

thanks @CSSFrancis , Your 3 point division makes it pretty clear. So "planes" used in bruker XML image actually not only fulfills point 1. of same dimentiality and resolution/size, but are even more closely related, as it have exactly same column conditions, and is pixel-to-pixel generated with exactly same beam simultaneously - thus even more - it should be stacked. Hopefully Your PR hyperspy/hyperspy#3031 will get accepted. I probably will stack images with unitless axes as for now.

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

CSSFrancis · 2023-05-29T22:10:52Z

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

In these types of situations "what is good for the goose is good for the gander" or both software packages will probably benefit from a consistent approach to these low level problems :) It also helps with interoperability which can only be a good thing.

To be honest the case #2 above is quite difficult from the perspective of hyperspy. There is nothing that says hyperspy cannot have a signal of signals similar to how numpy can have a ragged array of arrays. I considered something like that here but I am not sure if there is a consistent way to approach that. For now there hasn't really been a real need for it so I haven't really paid it much attention.

ericpre · 2023-05-30T09:16:37Z

Just for me to make sure that I don't misunderstand, @sem-geologist what you are asking is to have a signal, which contains BSE and SE images stacked together and another one which contains the EBSD or EDS data?
Could it be easily stack after loading the data by the library itself, at a stage, where it is known how the data needs to be structure/handle. I am not sure that this is the responsibility of RosettaSciIO to define this?

Currently, in hyperspy, these are considered as treated as different dataset and they expected to be handle at the workflow level (script, notebook, library etc.). Maybe what RosettaSciIO could do is to assign some dataset to a specify category: "main", "auxiliary" (data acquired simultaneously but not "main", or already processed data, i.e. map from a spectrum image), "survey", etc.

sem-geologist added 2 commits May 18, 2023 15:30

rename class SFS_reader into SfsReader

98397f6

to follow python class naming convention

cleanup bruker _api and expose SfsReader under api

8fb6490

sem-geologist marked this pull request as draft May 19, 2023 14:51

dont expose .api within __init__

ef337cc

jlaehne reviewed May 27, 2023

View reviewed changes

rsciio/bruker/_api.py Outdated Show resolved Hide resolved

jlaehne reviewed May 27, 2023

View reviewed changes

rsciio/bruker/_api.py Outdated Show resolved Hide resolved

sem-geologist added 4 commits May 28, 2023 13:59

rename classes to comply with PEP8:

efac6eb

SfsReader->SFSReader BCF_reader -> BCFReader update docstrings

fix_test

4d48681

expose spectra_from_xml, other polishing

63979e4

missed return fix

a57fd72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose some functions under bruker.api #121

expose some functions under bruker.api #121

sem-geologist commented May 19, 2023 •

edited

codecov bot commented May 19, 2023 •

edited

sem-geologist commented May 29, 2023 •

edited

jlaehne commented May 29, 2023

CSSFrancis commented May 29, 2023

sem-geologist commented May 29, 2023 •

edited

CSSFrancis commented May 29, 2023

ericpre commented May 30, 2023

expose some functions under bruker.api #121

Are you sure you want to change the base?

expose some functions under bruker.api #121

Conversation

sem-geologist commented May 19, 2023 • edited

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

codecov bot commented May 19, 2023 • edited

Codecov Report

sem-geologist commented May 29, 2023 • edited

jlaehne commented May 29, 2023

CSSFrancis commented May 29, 2023

sem-geologist commented May 29, 2023 • edited

CSSFrancis commented May 29, 2023

ericpre commented May 30, 2023

sem-geologist commented May 19, 2023 •

edited

codecov bot commented May 19, 2023 •

edited

sem-geologist commented May 29, 2023 •

edited

sem-geologist commented May 29, 2023 •

edited