-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate plot and set_xlog handling #1516
Consolidate plot and set_xlog handling #1516
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1516 +/- ##
==========================================
- Coverage 78.96% 78.93% -0.04%
==========================================
Files 72 72
Lines 26723 26661 -62
Branches 4493 4493
==========================================
- Hits 21101 21044 -57
+ Misses 5426 5417 -9
- Partials 196 200 +4
Continue to review full report at Codecov.
|
290676f
to
20fb833
Compare
dc78abd
to
0151dcc
Compare
1288f3e
to
273ce7d
Compare
eb752ff
to
2b22cff
Compare
3826dad
to
9868dae
Compare
Slightly-better description of the return value from get_data, as well as explaining when the IdentifierErr is raised (it must have been a copy-and-paste error).
14dd50e
to
b66ce10
Compare
Try to improve the documentation around the _plot/contour_types and _plot/contour_type_names dictionaries. This includes making it explicit the relationship between the two. The relationship is simple for the sherpa/ui code, but it is more-complex for the astro case. Add a comment to note that the "astrocompmodel" keyword maps to "model_componentl" and not "model_component". This typo was added in sherpa#906 and is not addressed here.
b66ce10
to
627cae5
Compare
Several changes: - prefere data rather than d for the variable storing the current data object. - if recalc is False in a get_fit_plot/contour call then we do not need to access the data and model plot/contour objects; this is a minor saving at best, and is more-useful pointing out the information flow. - the data-access calls for the get_xxx_plot calls is a bit messy because we want to allow the call to succeed when no data has been loaded (in this case, using the "default" type for the plot, for those that support different plot objects for different inputs). In this commit we make the code path a bit-cleaner by 1. explicitly separating out the recalc=False from True code paths (the latter forces the code to error out if there is no data set); 2. move some of the code into a method; we don't need to sub-class it, and the code is not a lot, but it does make things a bit cleaner. There are also several calls in astro that have been re-worked to follow the same pattern just so that they all follow the same scheme (where appropriate).
This stops the ability of calling set_ylog("energy") set_ylinear("photon") oto change the plots created by plot_energy_flux() / get_energy_flux_hist() plot_photon_flux() / get_photon_flux_hist() This was never documented (at least in the Sherpa docstrings) so is unlikely to be missed. The one case I can thinkg of would be a user who has used either set_ylog() set_ylog("all") and is then surprised the plot no-longer changes. The current work-around for these users is to call the appropriate get_xxx_flux_hist call and change the preference directly. For example get_energy_flux_hist(recalc=False).histo_prefs["ylog"] = True This may seem a minor change with no-obvious improvement for the user, but - the current behavior is not documented - future plans to consolidate the plot() and set_xlog() set of commands are actually hard to do with the "energy" and "flux" labels as they do not match the rest of the system (e.g. there is no get_energy_plot() command) It is better to make the energy and photon plots be handled like we do the other "special" commands like plot_scatter() and plot_trace().
Rather than use short names like "compsource", use the full name, like "source_component" for the _plot_types dictionary. Since this dictionary is used by set_xlog/... to select which plot objects get changed, support is added for the "shortened" names, by means of the new _plot_types_alias dictionary, which just maps from the shortened to new names. If someone uses the old name then a warning message is created, telling them the correct label to use. For example: >>> set_xlog("bkgfit") WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead This is done as a warning rather than a Python deprecated message to make sure users see it. This does not change the set of names that can not be used as a dataset identifier. This means that compsource source_component bkgfit are invalid identifier names, but bkg_fit is still a valid name for an identifier. Note that source_component and model_component were already supported, so the new labels added here are (only for the astro module): bkg_chisqr bkg_delchi bkg_fit bkg_model bkg_ratio bkg_resid bkg_source Note that the documentation for set_xlog and other commands still point to the plot function as the place that describes the labels that can be used. This is technically correct, but using the labels described in plot() will lead to a deprecated warning. The documentation is not changed here as the handling of plot is about to be changed (to use these new "longer" names), which will then make the set_xlog/... and plot documentation consistent.
Move the checks for whether a plot or contour label is valid to methods, and make use of them, in _set_plot_item and _multi_plot. This essentially makes the labels accepted by set_xlog/... and plot()/contour() - accept the same names - behave the same when a deprecated alias is used The behavior is actually technically "by luck" than "by design", and only happens because the dictionaries - _plot_types, _plot_type_names, and _plot_types_alias - end up having the same keys (the alias dictionary makes it a bit harder to see this). * Example The behavior is shown below - which includes seeing the new warning messages (both from set_xlog, which came in a previous commit, and the plot call, which is new): >>> from sherpa.astro.ui import * ... >>> import sys >>> sys.tracebacklimit = 0 ** CIAO 4.14 with bkgfit label >>> set_default_id("bkgfit") sherpa.utils.err.IdentifierErr: identifier 'bkgfit' is a reserved word >>> set_xlog("bkgfit") >>> plot("bkgfit") sherpa.utils.err.IdentifierErr: data set bkg_fit has not been set ** CIAO 4.14 with bkg_fit label >>> set_default_id("bkg_fit") >>> set_xlog("bkg_fit") sherpa.utils.err.PlotErr: Plot type 'bkg_fit' not found in ['data', 'model', 'source', 'fit', 'resid', 'ratio', 'delchi', 'chisqr', 'psf', 'kernel', 'compsource', 'compmodel', 'order', 'energy', 'photon', 'arf', 'bkg', 'bkgmodel', 'bkgfit', 'bkgsource', 'bkgratio', 'bkgresid', 'bkgdelchi', 'bkgchisqr'] >>> plot("bkg_fit") KeyError: 'bkg_fit' During handling of the above exception, another exception occurred: sherpa.utils.err.ArgumentErr: 'bkg_fit' is not a valid plot type ** This commit with bkgfit label Note that both uses create a WARNING message (not for set_default_id but that already errors out; we could think about saying something to indicate this but I do not think it is worth it): >>> set_default_id("bkgfit") sherpa.utils.err.IdentifierErr: identifier 'bkgfit' is a reserved word >>> set_xlog("bkgfit") WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead >>> plot("bkgfit") WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead sherpa.utils.err.IdentifierErr: data set 1 has not been set ** This commit with bkg_fit label >>> set_default_id("bkg_fit") sherpa.utils.err.IdentifierErr: identifier 'bkg_fit' is a reserved word >>> set_xlog("bkg_fit") >>> plot("bkg_fit") sherpa.utils.err.IdentifierErr: data set 1 has not been set
In sherpa#1177 access to image objects from the Session class was changed from direct access (e.g. self._xxx) to using a dictionary (self._image_types). This commit does the same thing for the contour objects: the objects are accessed from the _contour_types dictionary rather than having a set of _datacontour/... fields. For most users this will make no difference. It is only an issue for code that directly accesses the Session class. However, these are labelled private fields so they should be aware that things could change.
In sherpa#1177 access to image objects from the Session class was changed from direct access (e.g. self._xxx) to using a dictionary (self._image_types). This commit does the same thing for the plot objects: the objects are accessed from the _plot_types dictionary rather than having a set of _data/... fields. For most users this will make no difference. It is only an issue for code that directly accesses the Session class. However, these are labelled private fields so they should be aware that things could change. With this commit, images, contours, and plots now use the same scheme. However, there are four differences between the contour/image and plot handling: 1. we can have multiple objects associated with a plot type - at the moment the second element is for Data1DInt data and, for the astro version, the second element is for DataPHA data, and this makes the get_<label>_plot calls rather cumbersome; 2. the template-model handling is odd and doesn't quite follow this (see sherpa#1530); 3. the PHA "fit" plot also requires a different plot object to for the model than is provided by get_model_plot (for the fit we group the data but for the model we do not group it), and the access to these objects are currently via direct access to the session field (for the bkg fit plot), or just created on-the-fly for the "source" plot; 4. and some plot types - e.g. plot_scatter - still require direct access to a session field.
With the recent work to rationalize the labels used by plot()/contour() and set_xlog()/..., we do not need the _plot_type_names and _contour_type_names dictionaries. So we can remove them.
Note the new behavior of the plot command, and as set_xlog/... point to it, the behavior of those commands too.
# There is an argument to be made to have the "singleton" | ||
# plots, such as "fit", storing the object directly, and not | ||
# as a single-element list, but it was felt that having a | ||
# consistent access pattern was cleaner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, I see the benefit of keeping a uniform representation here (always use a list), but this can be discussed.
Earlier attempts at fixing this actually created a class to use here (e.g. create an instance for the value rather than a list, and this class then had support for the "default" value and adding extra elements), but I never found it satisfactory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a full review, just some ideas based on what I'm seeing now; I want to make comments before it's all done an too late to change. So, please don't waste your time answering my suggestions if you already tried that, or think it's stupid and won't work. That said, here are some thoughts:
This entire approach was (and still is, although it's better now) pretty convoluted and requires careful maintenance. When new plot types are added, the _plot_types
need to be updated, there is the _plot_types
and _contour_types
and later possible _image_types
and then a few others that don't really fall in this scheme.
I am wondering if more of this information should live in the Plot classes themselves, e.g.:
class RatioPlot(...):
# would accept all Data1DInt and DataPA, but not generic Data1D
# I know that that's not what RatioPlot does, but giving and example
_accepted_data_types = (Data1DInt, DataPHA)
_plot_area = 'data' # one of: data, contour, image
_ plot_name = 'Ratio'
... real code here ...
(Obviously, all names are open to bike shedding.)
That way, information would not have to be maintained separately here, it would be intrinsic to the plot class. I think that's a better place to keep that information.
_get_plot_type
would then search through all known plot classes
def _get_plotorcountour_type(plotarea, plotname, data):
for p in plot_type_list:
if ((p._plot_area == plotarea) and (p._plot_name == plotname) and
any([insinstance(data, allowed_class) for allows_class in p._accepted_data_dypes])):
return p
finally:
raise Error
The implementation above could be refined to deal with cases were there is more than one match, e.g. one could select the "most specific" of all plot classes (if one of them accepts all Data1D
and another only DataPHA
, then one may wnat to select the latter, while my implementation would pick whatever is first in the list).
How do we get the plot_type_list
? Either that's hand-curated, but a better approach would be to use a metaclass that puts stuff in a registry when it's imported like I do for the plotting backends in #1382. That's just a few lines of code in the Plot
and Histogram
base classes (could be a new class in the tree PLotHistroImageBase
from which PLot
and Histrogram
inherit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just occurs to me that in practice
_accepted_data_types = (Data1DInt, DataPHA)
_plot_area = 'data' # one of: data, contour, image, other
would not be set for every plot type, but at one point in the inheritance tree and then apply to all derived classes with no further code change. For example Plot
would get the _plot_area = 'data'
and all derived classes would have that property and Contour
would get _plot_area = 'contour'
and we only need to pay attention to special cases with multiple inheritance.
I've overlooked something that needs to be special cased in the current scheme, then maybe it also has to be special-cased in the proposed scheme and we need an area "other".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been trying to get this change merged for several years, so I am somewhat opposed to making more large architectural fixes. This PR is itself a rather cut-down version of the latest approach where I had added extra logic.
Currently I do not believe there is any "nice and simple" way to indicate the mapping between data type and object to use - for example
- source plot for DataPHA accepts lo/hi arguments unlike every-other plot [maybe we should add it to all plot types, but that's a separate discussion/issue/PR]
- plot_fit/plot_bkg_fit does not use the underlying "model" case, again only for DataPHA
- whatever is going on with template models
I've spent a lot of time trying to simplify this code and it has never ended in success: one conclusion is in fact automating the mapping between data-type and plot object logic doesn't really help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe our plots are just too specific for be generalized this way, but I wanted to at least ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the original question in the comment: I fully agree that a consistent form (here: always a list) is a good choice. I don't think alternatives have to be discussed in the comment - this is the interface we picked. There are many other options we did not pick and that's OK.
raise PlotErr("wrongtype", plottype, str(allowed)) | ||
|
||
def _check_contourtype(self, plottype): | ||
"""Is this a valid contour type?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two contour routines could just be done in-place, but they are used by _multi_plot
so we need them (i.e. the design wants plot
and contour
versions of the _check_xxx
and _get_xxx
routines),
|
||
""" | ||
|
||
return self._data.get(self._fix_id(id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very-simple routine, and I feel a bit bad about adding it as a method, but it does (I believe) make the downstream code a bit cleared - i.e. we call self.get_data
if we require the data to exist and self._get_data
if the data need not exist.
|
||
# This is an internal routine so it is not expected to be | ||
# called incorrectly, but make sure we catch any such problem. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is added to mainly point out why the two raise ArgumentxxxErr
calls are not covered by tests.
# of valid identifiers. unlike the plot and contour cases, as | ||
# there is | ||
# | ||
# - no image() call that acts like plot() or contour(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but there will be - so maybe we should include it already to make sure nobody writes code that uses kays we won't allow later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the keys for contour and image are already included in the plot table, so it turns out that adding the contour keys into the "what is a valid identifier" doesn't currently add anything, but is done partly for documentation and just-in-case something changes.
# There is an argument to be made to have the "singleton" | ||
# plots, such as "fit", storing the object directly, and not | ||
# as a single-element list, but it was felt that having a | ||
# consistent access pattern was cleaner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a full review, just some ideas based on what I'm seeing now; I want to make comments before it's all done an too late to change. So, please don't waste your time answering my suggestions if you already tried that, or think it's stupid and won't work. That said, here are some thoughts:
This entire approach was (and still is, although it's better now) pretty convoluted and requires careful maintenance. When new plot types are added, the _plot_types
need to be updated, there is the _plot_types
and _contour_types
and later possible _image_types
and then a few others that don't really fall in this scheme.
I am wondering if more of this information should live in the Plot classes themselves, e.g.:
class RatioPlot(...):
# would accept all Data1DInt and DataPA, but not generic Data1D
# I know that that's not what RatioPlot does, but giving and example
_accepted_data_types = (Data1DInt, DataPHA)
_plot_area = 'data' # one of: data, contour, image
_ plot_name = 'Ratio'
... real code here ...
(Obviously, all names are open to bike shedding.)
That way, information would not have to be maintained separately here, it would be intrinsic to the plot class. I think that's a better place to keep that information.
_get_plot_type
would then search through all known plot classes
def _get_plotorcountour_type(plotarea, plotname, data):
for p in plot_type_list:
if ((p._plot_area == plotarea) and (p._plot_name == plotname) and
any([insinstance(data, allowed_class) for allows_class in p._accepted_data_dypes])):
return p
finally:
raise Error
The implementation above could be refined to deal with cases were there is more than one match, e.g. one could select the "most specific" of all plot classes (if one of them accepts all Data1D
and another only DataPHA
, then one may wnat to select the latter, while my implementation would pick whatever is first in the list).
How do we get the plot_type_list
? Either that's hand-curated, but a better approach would be to use a metaclass that puts stuff in a registry when it's imported like I do for the plotting backends in #1382. That's just a few lines of code in the Plot
and Histogram
base classes (could be a new class in the tree PLotHistroImageBase
from which PLot
and Histrogram
inherit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is making things better. Is it making things perfect? No. For that we might have to abandon the UI altogether... I think this is mostly ready, but I have a few small comments; "I thought about this and decided on the current version because of X" is good enough for an answer in most cases - I just wanted to make sure these are intentional decisions and not an oversight.
mdl = s.create_model_component("const1d", "mdl") | ||
s.set_bkg_source(mdl) | ||
|
||
s.plot(f"bkg_{label}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a side note: All those tests can be improved when we have the bokeh backend. Since bokeh serialized for json, it's very easy to do assert expected_label in plot_in_json_format
, while checking e.g. matplotlib pngs on the pixel level is a bit of a pain (though other packages do it), so I expect that we can revise or add a number of tests to be more stringent and not just check that code executes without and error, but that the labels also look what we want - but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did ask about perceptual hashes in pytest-mpl
back in the day - matplotlib/pytest-mpl#21 - there's some discussion in matplotlib/pytest-mpl#146 which suggests they may have an implementation soon (matplotlib/pytest-mpl#150)
# There is an argument to be made to have the "singleton" | ||
# plots, such as "fit", storing the object directly, and not | ||
# as a single-element list, but it was felt that having a | ||
# consistent access pattern was cleaner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the original question in the comment: I fully agree that a consistent form (here: always a list) is a good choice. I don't think alternatives have to be discussed in the comment - this is the interface we picked. There are many other options we did not pick and that's OK.
raise ArgumentTypeErr("intstr") | ||
|
||
if _is_integer(id): | ||
return id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do integers have to be special-cased? The checks use "in" with checks the keys in a dict. Dicts can have integer keys, so the "in" should work and (since all keys here are strings) tell us that it's an allowed name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly a style thing. Before this check we have id : Union[int, str]
but after it we have id : str
.
This means that I can theoretically say
def _check_plottype(self, id : str) -> nool:
but I needed to add explicit an assert isinstance(id, str)
statement when using mypy
in #1525 . However, this is experimental code that isn't going to get merged, and I may have been mis-understanding quite how typeguards work there
Each group consists of the name of the plot/contour, followed | ||
by optional arguments. So | ||
|
||
self._multi_plot(["data", "data", "org"], plotmeth="plot") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly dislike this design, but I admit that this PR is not the place to change it. Every problem in computer science can be solved with one more level of indirection, they say.
Here self._multi_plot([("data", ), ("data", "org")], plotmeth="plot")
would work wonders - no more guessing if something if a plot command name or the name of a dataset identifier, strong encouragement not to reply on the default (although it can be done with just a tuple of length 1, i.e. by typing an extra pair of () and a comma), no need ot maintain a list of forbidden identifiers etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note: this is a very-thin wrapper around plot
and contour
. You would have to change how those are called - i.e.
plot("data", "data", 2, ylog=True)
would need to be changed to whatever you would come up with. The _multi_plot
call is basically converting the user input into this design - i.e. a sequence of "command name" followed by arguments - so if plot
and contour
are changed then we can simplify this. To be honest the saving isn't going to be huge in terms of the number of lines, but there are things you could think of doing to make it easier to name arguments to send - e.g.
plot(["data", "id", 1], ["data", "id", 2])
but
a) this is not the place
b) I'm not sure it's worth it.
Multiple arguments can be sent, but they rely on positional | ||
ordering only, so | ||
|
||
args = ["model_component", mdl, "model_component", 2, mdl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, following my comment above, this would be [("model_component", mdl), ("model_component", 2, mdl)]
with a clear separation which argument go into which call and leaving it to that call do do with the arguments whatever that call wants to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, this routine is parsing the plot
and contour
commands, so you can not make this change without breaking existing code.
if recalc: | ||
data = self.get_data(id) | ||
else: | ||
data = self._get_data(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about the 15th time that I see those four lines of code. Does it make sense to pull them out as a separate function data = self._get_data_recalc(id, recalc)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but
a) as mentioned, I have pared this PR back to it's essentials, and have had about 4 different ways of trying to reduce the number of lines and balancing that against making methods that do too much. I quite like the separation between get_data
and _get_data
, but moving the recalc
logic into the check starts to worry me (nut would allow me to remove the _get_data call`
b) I haven't checked, but I don't think the number of times we do this is that huge, so is the space saving worth it (i.e. 4 times vs 40)?
Summary
Use the same labels for the plot and set_xlog/set_ylog/set_xlinear/set_ylinear functions. As a result, a number of names can no-longer be used as an identifier for a dataset (bkg_chisqr, bkg_delchi, bkg_fit, bkg_model, bkg_ratio, bkg_resid, bkg_source). A number of labels are now considered deprecated and their use will trigger a warning message, displaying the label that should be used. The "energy" and "photon" labels can no-longer be used with the set_x/yxxx functions: instead the get_energy_flux_hist and get_photon_flux_hist calls can be used, such as
get_energy_flux_hist(recalc=False).histo_prefs["ylog"] = True
.Details
Background
This replaces #1089 and #1175. One of the discussion issues on #1175 was a comment I'd made about some code being in a transitional state that would be finished off in a later commit: this addresses this by making the image, contour, and plot commands all use the
_<label>_type
dictionary, and to try and unify the meaning of this (between the plot/contour and set_xlog/... set of commands). The image() command is a little bit different, and set_xlog/.. only is only relevant for the plot commands, so we don't have a clean design for what these structures represent (in the sense of "what do the keys mean"), so having a unified design is a bit hard here. The data-specific plot support added in #906 (or, rather, the more-obviously-designed support for data-specific plots, as there was always some level of support for this) is another reason why there's difference between plots, contours, and images.Overall this has the same "shape" as the previous attempts, but it has teased out the differences between how set_xlog/... and the plot commands work, and unifies them. It is also less expressive than previous attempts, in that it doesn't try to do much follow-on work to take advantage of the plot changes, as it turns out to be not-quite as effective in reducing the code base as I'd hoped.
In the following I describe the changes made to the code, skipping the intermediate steps. I have added fairly extensive notes to the main commits to describe the per-commit changes so you can follow those, and made sure each "important" step ran through CI to show it works.
The first actual code change (the third commit) is technically a separate bug fix for the handling of calls to
source_component
andmodel_component
when usingData1DInt
data, but it is not a significant-enough fix to be pulled out to a separate commit and it fits in well with the changes in this PR (i.e. it needs to be done to make this PR make sense anyway).The original code
The first commit of Sherpa to git is 9ccf0e7 and contains the Sesion
_plot_types
and_contour_types
dictionaries, with keys being labels like"data"
and"compsource"
, and the values are the plot objects stored as fields, such asself._dataplot
. Theget_data_plot
call was implemented asand the
_prepare_plotobj
call started with (this is the sherpa/ui version; the sherpa/astro/ui version had it's own copy rather than falling through to this logic):A similar scheme was used for contours (you can see that
_prepare_plotobj
deals with both plot and contour classes) and images (this had a separate_prepare_imageobj
call), although there was no_image_types
structure. The astro session code over-rode the_prepare_plotobj
call.Recent changes
In trying to fix plot issues I found that I could never work out what was being done, as it ended up going through
_prepare_plotobj
and this hid subtle (or not-so-subtle) issues.In #906 (Oct 2020) I re-worked the plot and contour code so that access to the
plotobj
is now controlled by the relevantget_xxx_plot
orget_xxx_contour
call, rather than going via_prepare_plotobj
. This lead toget_data_plot
becomingalthough this is a simple case. We have more-complex code - in particular, as pointed out below, the above code has changed, but the point is that it is now in the relevant call rather than hidden away in code that can't be easily over-ridden. This PR added the
_plot_type_names
and_contour_type_names
dictionaries which I now want to remove, as I have a better understanding of how the various pieces fit together.in #1177 (June 2021) I added changes to the image code to copy the #906 changes and to extend the logic, so that we do not access fields directly from the session object but do so via an "API" (in this case the API is just a dictionary). I note that the image and contour cases are easier than the plot case as we don't have any extra logic: we always use a particular class for a particular image/contour call. In the plot case we care about the data type (Data1DInt or DataPHA) and there are other places where we appear to care about the model type (this is an area I note below as needing more follow-up, e.g. #1530 but it is not to be fixed here).
The
get_data_plot
code (for the ui case) has changed to become (this was the main branch at one point of the recent development of the PR):sherpa/sherpa/ui/utils.py
Lines 10693 to 10713 in feffe35
Changes in this PR
The basic idea of this PR is to follow #1177 and remove the direct access to fields like
self._dataplot
and access them from a dictionary. This would be for both the plot and contour commands. There main initial complication is that the obvious thing to do - which is re-use the existing_plot_types
and_component_types
dictionaries - ends up causing issues with theset_xlog
/../set_ylinear
set of commands. This is because the keys of_plot_types
and_component_types
are used to indicate the set of valid dataset identifiers and what labels are valid as "plot/contour" names in theplot()
andcontour()
calls. I note that the contour code does not care about theset_xlog
/... family of commands but it does about the second plot, and that the image code in #1177 didn't have to worry about either case (although I note that we could add animage
command in #1517 but that is irrelevant here).So, there is some work needed to be done to rationalize the
set_xlog
/... andplot()
commands so that they use the same labels and underlying data structures. This leads to changes to the set of valid identifiers: at the moment we allow old and new names but the idea is that the old names are deprecated and will be removed in a future release. The user is noted by a screen message whenever they use an old name.So, the idea is that we now have in
sherpa.ui.utils.Session
and we access the data with
For the
sherpa.astro.ui.utils.Session
class this gets augmented withThe logic for matching the data class to the plot object is intentionally low-tech: it's just a set of
isinstance
calls and we assume that those plots that care about this have the_plot_types
dictionary set up correctly (that is, the value is a list with 2 or 3 elements in the correct order, depending on which module you are in). I had tried fancier approaches but they didn't make the code simpler, and would make it harder to extend in the future (e.g. for #347).NOTES
not all plot types care about the data they are sent (perhaps they should, to better catch things like trying to plot a
DataIMG
dataset, but that's a separate issue and it also depends on what we want to do with add plot support for image data sets #347); in this case theget_xxx_plot
call is simpler and doesn't need to be over-ridden in the astro module;the
fit
andbkg_fit
plots complicate things since, for PHA data, the model part of the plot behaves differently to amodel
plot (the fit version groups the model, the model version shows ungrouped data) and the way that this is done is subtly-different between thefit
andbkg_fit
versions - I do not try to address this here;the
source
plot has some strange behavior with template models, and this behavior depends on whether you are using PHA data or not - I do not try to address this here but instead raised issue template-model plotting/handling needs a proper review #1530 to track it.Interaction with
set_xlog
et al andplot
The set_xlog/ylog/xlinear/ylinear commands are affected by these changes because there is an undocumented relationship between dictionary keys and the labels you can use with these commands. This PR attempts to clarify and clean this up. These commands take an optional string (defaulting to
"all"
). When it is not"all"
it must match a key from either the_plot_types
dictionary, or the_plot_types_alias
dictionary (new in this PR). The idea is that the_plot_types
key must match a correspondingget_<key>_plot
method, which makes it easier to see what the relationship is betweenThis last call is new in this PR. Previously you would have had to know to say
The code now labels these "special" labels as an alias, so that they still work but the user is warned to change their code, and we can hopefully remove them at some point in the future. We do not use the python deprecation warning system because users of the UI layer are unlikely to see these messages (since they helpfully get hidden by default), so we use the warning logger. The new behavior is:
I honestly don't know what commands like "astrodata" meant - I think it was a way you could change only the PHA plot command, but this was never documented and thanks to changes I've made to the plotting code they no-longer have a meaning.
Since the
plot
command now uses the same labels, we get the same behavior - e.g.NOTE
Related changes because of this are
plot
command now takes the "new" labels, soplot("bkg_fit")
mean "do the plot_bkg_fit plot" when before you needed to know thatplot("bkgfit")
was the magicplot("model_component")
as well as the "secret"plot("compsource")
. I'm not sure if this is from something I did in earlier plot changes or was something that was always there. This has not been changed, but I just wanted to point it outid
argument in calls, is actually based around the labels that theplot
andcontour
calls accepts; this makes sense, as you need to know when parsingplot(arg1, arg2, arg3, ...)
whetherarg2
is an identifier or a new plot. This means that changes to the values thatset_xlog
take changes this. So we have some changes.As an example of the last point, in CIAO 4.14:
The current behavior is to make both the aliases and the actual commands be reserved:
It turns out that it should only be the new bkg_xxx labels that are added to the set of invalid labels - that is bkg_chisqr, bkg_delchi, bkg_fit, bkg_model, bkg_ratio, bkg_resid, bkg_source.
Consequences
One consequence of these changes is that we now have a consistent set of names that are used
Some special plot commands: energy and photon
Note: these are only present in the astro version.
As part of this clean up, the
"energy"
and"photon"
labels have been removed from the_plot_types
dictionary, which means they can no longer be used withset_xlog
/..., as shown below. This is a regression in functionality, but the meaning of these labels is unclear, and theplot_energy_flux
/plot_photon_flux
commands are a bit different to the other calls.I note that we do not document the behavior of "energy" and "photon", so it is not like we are changing behavior we told users they could rely on.
This is most-likely to cause a problem for users who have been using the
"all"
label - e.g.set_ylog()
orset_ylog("all")
- without knowing it.To change these plot preferences users will have to call the relevant
get
call and change the setting manually:This is rather ungainly, but I don't see it as a remotely-high-priority to fix.
Some special plot commands: pdf, cdf, trace, scatter, pvalue
There are other plot_xxx calls that do not use the
_plot_types
dictionary - e.g. plot_pdf - that are present in both versions (unlike energy/photon which are only in the astro session). These calls generally require multiple arguments to call - rather than something likeplot_data(2)
orplot("data", 2)
- and so it is no clear how best to go forward with these.This means that there are several "plot objects" that are still left as direct attributes of the Session class. However, this is always likely the case because we have things like
self._splitplot
which is used for setting up multiple plots, andself._intproj
which is used by theint_proj
command, that are not directly related toplot
orset_xlog
.Left over fields of the Session class
I will mention two that are left in that are used by the
plot
code, for reference and because they have been mentioned above:self._comptmplsrcplot = sherpa.plot.ComponentTemplateSourcePlot()
I do not understand what the template plot code is meant to be doing. It looks half-finished to me. This PR changes (I believe) how the template model display works for DataPHA data as I think the logic has now changed. We don't have tests to check this. We also do not seem to handle the source and model cases the same for the template models (ie there's no
ComponentTemplateModelPlot
). I really don't want to worry about this code in this PR as I believe it can be looked at after this has been merged. I argue that the logic of what is going on is more-obvious after this PR.Since writing this I created #1530 as I feel it's not something to address in this PR
self._bkgmodelplot = sherpa.astro.plot.BkgModelPHAHistogram()
PHA plots behave "strangely" in calls to
plot_fit
andplot_bkg_fit
which took me a long time to understand, even after looking at the code. We want the model to be displayed using the grouped data here, whereasplot_model
andplot_bkg_model
display the data ungrouped. This has caused me no-end of pain, and it would be nice to have a "better" way of doing this, but this is existing functionality and this PR is not the place to try and address this.Saying that, we do have a difference to how
plot_fit
andplot_bkg_fit
do things - the former generates the plot object each time whereas the latter uses theself._bkgmodelplot
. I think we should remove this (so makeplot_bkg_fit
behave likeplot_fit
), but I haven't made that change yet.For reference, the
plot_fit
code is (here using themain
branch at the time of writing)sherpa/sherpa/astro/ui/utils.py
Lines 10440 to 10468 in feffe35
and
plot_bkg_fit
usessherpa/sherpa/astro/ui/utils.py
Lines 10788 to 10803 in feffe35
which shows the different behavior in how the
modelobj
variable is set.