Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate plot and set_xlog handling #1516

Merged
merged 13 commits into from
Jul 6, 2022

Conversation

DougBurke
Copy link
Contributor

@DougBurke DougBurke commented May 21, 2022

Summary

Use the same labels for the plot and set_xlog/set_ylog/set_xlinear/set_ylinear functions. As a result, a number of names can no-longer be used as an identifier for a dataset (bkg_chisqr, bkg_delchi, bkg_fit, bkg_model, bkg_ratio, bkg_resid, bkg_source). A number of labels are now considered deprecated and their use will trigger a warning message, displaying the label that should be used. The "energy" and "photon" labels can no-longer be used with the set_x/yxxx functions: instead the get_energy_flux_hist and get_photon_flux_hist calls can be used, such as get_energy_flux_hist(recalc=False).histo_prefs["ylog"] = True.

Details

Background

This replaces #1089 and #1175. One of the discussion issues on #1175 was a comment I'd made about some code being in a transitional state that would be finished off in a later commit: this addresses this by making the image, contour, and plot commands all use the _<label>_type dictionary, and to try and unify the meaning of this (between the plot/contour and set_xlog/... set of commands). The image() command is a little bit different, and set_xlog/.. only is only relevant for the plot commands, so we don't have a clean design for what these structures represent (in the sense of "what do the keys mean"), so having a unified design is a bit hard here. The data-specific plot support added in #906 (or, rather, the more-obviously-designed support for data-specific plots, as there was always some level of support for this) is another reason why there's difference between plots, contours, and images.

Overall this has the same "shape" as the previous attempts, but it has teased out the differences between how set_xlog/... and the plot commands work, and unifies them. It is also less expressive than previous attempts, in that it doesn't try to do much follow-on work to take advantage of the plot changes, as it turns out to be not-quite as effective in reducing the code base as I'd hoped.

In the following I describe the changes made to the code, skipping the intermediate steps. I have added fairly extensive notes to the main commits to describe the per-commit changes so you can follow those, and made sure each "important" step ran through CI to show it works.

The first actual code change (the third commit) is technically a separate bug fix for the handling of calls to source_component and model_component when using Data1DInt data, but it is not a significant-enough fix to be pulled out to a separate commit and it fits in well with the changes in this PR (i.e. it needs to be done to make this PR make sense anyway).

The original code

The first commit of Sherpa to git is 9ccf0e7 and contains the Sesion _plot_types and _contour_types dictionaries, with keys being labels like "data" and "compsource", and the values are the plot objects stored as fields, such as self._dataplot. The get_data_plot call was implemented as

        self._prepare_plotobj(id, self._dataplot)
        return self._dataplot

and the _prepare_plotobj call started with (this is the sherpa/ui version; the sherpa/astro/ui version had it's own copy rather than falling through to this logic):

    def _prepare_plotobj(self, id, plotobj, model=None):
        id = self._fix_id(id)
        if isinstance(plotobj, sherpa.plot.FitPlot):
            plotobj.prepare(self._prepare_plotobj(id, self._dataplot),
                            self._prepare_plotobj(id, self._modelplot))
        elif isinstance(plotobj, sherpa.plot.FitContour):
            plotobj.prepare(self._prepare_plotobj(id, self._datacontour),
                            self._prepare_plotobj(id, self._modelcontour))
        else:
            if(isinstance(plotobj, sherpa.plot.PSFPlot) or
               isinstance(plotobj, sherpa.plot.PSFContour) or
               isinstance(plotobj, sherpa.plot.PSFKernelPlot) or 
               isinstance(plotobj, sherpa.plot.PSFKernelContour)):
                plotobj.prepare(self.get_psf(id), self.get_data(id))
...

A similar scheme was used for contours (you can see that _prepare_plotobj deals with both plot and contour classes) and images (this had a separate _prepare_imageobj call), although there was no _image_types structure. The astro session code over-rode the _prepare_plotobj call.

Recent changes

In trying to fix plot issues I found that I could never work out what was being done, as it ended up going through _prepare_plotobj and this hid subtle (or not-so-subtle) issues.

In #906 (Oct 2020) I re-worked the plot and contour code so that access to the plotobj is now controlled by the relevant get_xxx_plot or get_xxx_contour call, rather than going via _prepare_plotobj. This lead to get_data_plot becoming

        plotobj = self._dataplot
        if recalc:
            plotobj.prepare(self.get_data(id), self.get_stat())
        return plotobj

although this is a simple case. We have more-complex code - in particular, as pointed out below, the above code has changed, but the point is that it is now in the relevant call rather than hidden away in code that can't be easily over-ridden. This PR added the _plot_type_names and _contour_type_names dictionaries which I now want to remove, as I have a better understanding of how the various pieces fit together.

in #1177 (June 2021) I added changes to the image code to copy the #906 changes and to extend the logic, so that we do not access fields directly from the session object but do so via an "API" (in this case the API is just a dictionary). I note that the image and contour cases are easier than the plot case as we don't have any extra logic: we always use a particular class for a particular image/contour call. In the plot case we care about the data type (Data1DInt or DataPHA) and there are other places where we appear to care about the model type (this is an area I note below as needing more follow-up, e.g. #1530 but it is not to be fixed here).

The get_data_plot code (for the ui case) has changed to become (this was the main branch at one point of the recent development of the PR):

sherpa/sherpa/ui/utils.py

Lines 10693 to 10713 in feffe35

# Allow an answer to be returned if recalc is False and no
# data has been loaded. However, it's not obvious what the
# answer should be if recalc=False and the dataset has
# changed type since get_data_plot was last called.
#
try:
is_int = isinstance(self.get_data(id), sherpa.data.Data1DInt)
except IdentifierErr as ie:
if recalc:
raise ie
is_int = False
if is_int:
plotobj = self._datahistplot
else:
plotobj = self._dataplot
if recalc:
plotobj.prepare(self.get_data(id), self.get_stat())
return plotobj

Changes in this PR

The basic idea of this PR is to follow #1177 and remove the direct access to fields like self._dataplot and access them from a dictionary. This would be for both the plot and contour commands. There main initial complication is that the obvious thing to do - which is re-use the existing _plot_types and _component_types dictionaries - ends up causing issues with the set_xlog/../set_ylinear set of commands. This is because the keys of _plot_types and _component_types are used to indicate the set of valid dataset identifiers and what labels are valid as "plot/contour" names in the plot() and contour() calls. I note that the contour code does not care about the set_xlog/... family of commands but it does about the second plot, and that the image code in #1177 didn't have to worry about either case (although I note that we could add an image command in #1517 but that is irrelevant here).

So, there is some work needed to be done to rationalize the set_xlog/... and plot() commands so that they use the same labels and underlying data structures. This leads to changes to the set of valid identifiers: at the moment we allow old and new names but the idea is that the old names are deprecated and will be removed in a future release. The user is noted by a screen message whenever they use an old name.

So, the idea is that we now have in sherpa.ui.utils.Session

        self._plot_types = {
            'data': [sherpa.plot.DataPlot(), sherpa.plot.DataHistogramPlot()],
            'model': [sherpa.plot.ModelPlot(), sherpa.plot.ModelHistogramPlot()],
            'model_component': [sherpa.plot.ComponentModelPlot(), sherpa.plot.ComponentModelHistogramPlot()],
            'source': [sherpa.plot.SourcePlot(), sherpa.plot.SourceHistogramPlot()],
            'source_component': [sherpa.plot.ComponentSourcePlot(), sherpa.plot.ComponentSourceHistogramPlot()],
            'fit': [sherpa.plot.FitPlot()],
            'resid': [sherpa.plot.ResidPlot()],
            'ratio': [sherpa.plot.RatioPlot()],
            'delchi': [sherpa.plot.DelchiPlot()],
            'chisqr': [sherpa.plot.ChisqrPlot()],
            'psf': [sherpa.plot.PSFPlot()],
            'kernel': [sherpa.plot.PSFKernelPlot()]
        }

and we access the data with

    def get_data_plot(self, id=None, recalc=True):
...
        # Allow an answer to be returned if recalc is False and no
        # data has been loaded. However, it's not obvious what the
        # answer should be if recalc=False and the dataset has
        # changed type since get_data_plot was last called.
        #
        if recalc:
            data = self.get_data(id)
        else:
            data = self._get_data(id)

        # This uses the implicit conversion of bool to 0 or 1.
        #
        idx = isinstance(data, sherpa.data.Data1DInt)
        plotobj = self._plot_types["data"][idx]
        if recalc:
            plotobj.prepare(data, self.get_stat())

        return plotobj

For the sherpa.astro.ui.utils.Session class this gets augmented with

        self._plot_types['data'].append(sherpa.astro.plot.DataPHAPlot())

...

    def get_data_plot(self, id=None, recalc=True):
        if recalc:
            data = self.get_data(id)
        else:
            data = self._get_data(id)

        if isinstance(data, sherpa.astro.data.DataPHA):
            plotobj = self._plot_types["data"][2]
            if recalc:
                plotobj.prepare(data, self.get_stat())

            return plotobj

        return super().get_data_plot(id, recalc=recalc)

The logic for matching the data class to the plot object is intentionally low-tech: it's just a set of isinstance calls and we assume that those plots that care about this have the _plot_types dictionary set up correctly (that is, the value is a list with 2 or 3 elements in the correct order, depending on which module you are in). I had tried fancier approaches but they didn't make the code simpler, and would make it harder to extend in the future (e.g. for #347).

NOTES

  1. not all plot types care about the data they are sent (perhaps they should, to better catch things like trying to plot a DataIMG dataset, but that's a separate issue and it also depends on what we want to do with add plot support for image data sets #347); in this case the get_xxx_plot call is simpler and doesn't need to be over-ridden in the astro module;

  2. the fit and bkg_fit plots complicate things since, for PHA data, the model part of the plot behaves differently to a model plot (the fit version groups the model, the model version shows ungrouped data) and the way that this is done is subtly-different between the fit and bkg_fit versions - I do not try to address this here;

  3. the source plot has some strange behavior with template models, and this behavior depends on whether you are using PHA data or not - I do not try to address this here but instead raised issue template-model plotting/handling needs a proper review #1530 to track it.

Interaction with set_xlog et al and plot

The set_xlog/ylog/xlinear/ylinear commands are affected by these changes because there is an undocumented relationship between dictionary keys and the labels you can use with these commands. This PR attempts to clarify and clean this up. These commands take an optional string (defaulting to "all"). When it is not "all" it must match a key from either the _plot_types dictionary, or the _plot_types_alias dictionary (new in this PR). The idea is that the _plot_types key must match a corresponding get_<key>_plot method, which makes it easier to see what the relationship is between

set_xlog("data")
set_ylinear("resid")
set_xlog("model_component")

This last call is new in this PR. Previously you would have had to know to say

set_xlog("compmodel")

The code now labels these "special" labels as an alias, so that they still work but the user is warned to change their code, and we can hopefully remove them at some point in the future. We do not use the python deprecation warning system because users of the UI layer are unlikely to see these messages (since they helpfully get hidden by default), so we use the warning logger. The new behavior is:

>>> from sherpa.astro.ui import *
WARNING: imaging routines will not be available, 
failed to import sherpa.image.ds9_backend due to 
'RuntimeErr: DS9Win unusable: Could not find ds9 on your PATH'
>>> set_xlog("compsource")
WARNING: The argument 'compsource' is deprecated and 'source_component' should be used instead
>>> set_xlog("compmodel")
WARNING: The argument 'compmodel' is deprecated and 'model_component' should be used instead
>>> set_xlog("compmodel")
WARNING: The argument 'compmodel' is deprecated and 'model_component' should be used instead
>>> set_xlog("astrodata")
WARNING: The argument 'astrodata' is deprecated and 'data' should be used instead
>>> set_xlog("astromodel")
WARNING: The argument 'astromodel' is deprecated and 'model' should be used instead
>>> set_xlog("bkgfit")
WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead

I honestly don't know what commands like "astrodata" meant - I think it was a way you could change only the PHA plot command, but this was never documented and thanks to changes I've made to the plotting code they no-longer have a meaning.

Since the plot command now uses the same labels, we get the same behavior - e.g.

>>> plot("compsource", mdl)
WARNING: The argument 'compsource' is deprecated and 'source_component' should be used instead

NOTE

  • the message gets repeated, rather than only shown once; we could add an internal cache so that the message is only displayed once per session (well, per "clean" call)

Related changes because of this are

  1. the plot command now takes the "new" labels, so plot("bkg_fit") mean "do the plot_bkg_fit plot" when before you needed to know that plot("bkgfit") was the magic
  2. the old names are still supported (with the warning message)
  3. the old code did support some of these aliases, in particular you used to be able to say plot("model_component") as well as the "secret" plot("compsource"). I'm not sure if this is from something I did in earlier plot changes or was something that was always there. This has not been changed, but I just wanted to point it out
  4. the identifiers that are marked as "forbidden" - that is the strings you can not use as the id argument in calls, is actually based around the labels that the plot and contour calls accepts; this makes sense, as you need to know when parsing plot(arg1, arg2, arg3, ...) whether arg2 is an identifier or a new plot. This means that changes to the values that set_xlog take changes this. So we have some changes.

As an example of the last point, in CIAO 4.14:

sherpa-4.14.0> set_default_id("astrodata")
IdentifierErr: identifier 'astrodata' is a reserved word

sherpa-4.14.0> set_default_id("data")
IdentifierErr: identifier 'data' is a reserved word

sherpa-4.14.0> set_default_id("bkgfit")
IdentifierErr: identifier 'bkgfit' is a reserved word

sherpa-4.14.0> set_default_id("bkg_fit")

sherpa-4.14.0> set_default_id("compsource")
IdentifierErr: identifier 'compsource' is a reserved word

sherpa-4.14.0> set_default_id("source_component")
IdentifierErr: identifier 'source_component' is a reserved word

sherpa-4.14.0> set_default_id("energy")

sherpa-4.14.0> set_default_id("flux")

The current behavior is to make both the aliases and the actual commands be reserved:

>>> import sys
>>> sys.tracebacklimit = 0
>>> set_default_id("data")
sherpa.utils.err.IdentifierErr: identifier 'data' is a reserved word
>>> set_default_id("astrodata")
sherpa.utils.err.IdentifierErr: identifier 'astrodata' is a reserved word
>>> set_default_id("bkgfit")
sherpa.utils.err.IdentifierErr: identifier 'bkgfit' is a reserved word
>>> set_default_id("bkg_fit")
sherpa.utils.err.IdentifierErr: identifier 'bkg_fit' is a reserved word
>>> set_default_id("compsource")
sherpa.utils.err.IdentifierErr: identifier 'compsource' is a reserved word
>>> set_default_id("source_component")
sherpa.utils.err.IdentifierErr: identifier 'source_component' is a reserved word
>>> set_default_id("energy")
>>> set_default_id("flux")
>>> 

It turns out that it should only be the new bkg_xxx labels that are added to the set of invalid labels - that is bkg_chisqr, bkg_delchi, bkg_fit, bkg_model, bkg_ratio, bkg_resid, bkg_source.

Consequences

One consequence of these changes is that we now have a consistent set of names that are used

  • by the set_xlog/... family of commands;
  • plot() and contour() to refer to the plot/contour to display;
  • and to determine what labels are not allowed to be used as a dataset identifier (because of the confusion with the plot/contour calls if they are used).

Some special plot commands: energy and photon

Note: these are only present in the astro version.

As part of this clean up, the "energy" and "photon" labels have been removed from the _plot_types dictionary, which means they can no longer be used with set_xlog/..., as shown below. This is a regression in functionality, but the meaning of these labels is unclear, and the plot_energy_flux/plot_photon_flux commands are a bit different to the other calls.

>>> from sherpa.astro.ui import *
WARNING: imaging routines will not be available, 
failed to import sherpa.image.ds9_backend due to 
'RuntimeErr: DS9Win unusable: Could not find ds9 on your PATH'
>>> import sys
>>> sys.tracebacklimit = 0
>>> set_xlog("energy")
sherpa.utils.err.PlotErr: Plot type 'energy' not found in ['data', 'model', 'source', 'fit', 'resid', 'ratio', 'delchi', 'chisqr', 'psf', 'kernel', 'source_component', 'model_component', 'order', 'arf', 'bkg', 'bkg_model', 'bkg_fit', 'bkg_source', 'bkg_ratio', 'bkg_resid', 'bkg_delchi', 'bkg_chisqr']

I note that we do not document the behavior of "energy" and "photon", so it is not like we are changing behavior we told users they could rely on.

This is most-likely to cause a problem for users who have been using the "all" label - e.g. set_ylog() or set_ylog("all") - without knowing it.

To change these plot preferences users will have to call the relevant get call and change the setting manually:

get_energy_flux_hist(recalc=False).histo_prefs["ylog"] = True

This is rather ungainly, but I don't see it as a remotely-high-priority to fix.

Some special plot commands: pdf, cdf, trace, scatter, pvalue

There are other plot_xxx calls that do not use the _plot_types dictionary - e.g. plot_pdf - that are present in both versions (unlike energy/photon which are only in the astro session). These calls generally require multiple arguments to call - rather than something like plot_data(2) or plot("data", 2) - and so it is no clear how best to go forward with these.

This means that there are several "plot objects" that are still left as direct attributes of the Session class. However, this is always likely the case because we have things like self._splitplot which is used for setting up multiple plots, and self._intproj which is used by the int_proj command, that are not directly related to plot or set_xlog.

Left over fields of the Session class

I will mention two that are left in that are used by the plot code, for reference and because they have been mentioned above:

  1. self._comptmplsrcplot = sherpa.plot.ComponentTemplateSourcePlot()

I do not understand what the template plot code is meant to be doing. It looks half-finished to me. This PR changes (I believe) how the template model display works for DataPHA data as I think the logic has now changed. We don't have tests to check this. We also do not seem to handle the source and model cases the same for the template models (ie there's no ComponentTemplateModelPlot). I really don't want to worry about this code in this PR as I believe it can be looked at after this has been merged. I argue that the logic of what is going on is more-obvious after this PR.

Since writing this I created #1530 as I feel it's not something to address in this PR

  1. self._bkgmodelplot = sherpa.astro.plot.BkgModelPHAHistogram()

PHA plots behave "strangely" in calls to plot_fit and plot_bkg_fit which took me a long time to understand, even after looking at the code. We want the model to be displayed using the grouped data here, whereas plot_model and plot_bkg_model display the data ungrouped. This has caused me no-end of pain, and it would be nice to have a "better" way of doing this, but this is existing functionality and this PR is not the place to try and address this.

Saying that, we do have a difference to how plot_fit and plot_bkg_fit do things - the former generates the plot object each time whereas the latter uses the self._bkgmodelplot. I think we should remove this (so make plot_bkg_fit behave like plot_fit), but I haven't made that change yet.

For reference, the plot_fit code is (here using the main branch at the time of writing)

sherpa/sherpa/astro/ui/utils.py

Lines 10440 to 10468 in feffe35

def get_fit_plot(self, id=None, recalc=True):
plotobj = self._fitplot
if not recalc:
return plotobj
d = self.get_data(id)
if isinstance(d, sherpa.astro.data.DataPHA):
dataobj = self.get_data_plot(id, recalc=recalc)
# We don't use get_model_plot as that uses the ungrouped data
# modelobj = self.get_model_plot(id)
# but we do want to use a histogram plot, not _modelplot.
# modelobj = self._modelplot
# Should this object be stored in self? There's
# no way to get it by API (apart from get_fit_plot).
#
modelobj = sherpa.astro.plot.ModelPHAHistogram()
modelobj.prepare(d, self.get_model(id),
self.get_stat())
plotobj.prepare(dataobj, modelobj)
return plotobj
return super().get_fit_plot(id, recalc=recalc)
get_fit_plot.__doc__ = sherpa.ui.utils.Session.get_fit_plot.__doc__

and plot_bkg_fit uses

sherpa/sherpa/astro/ui/utils.py

Lines 10788 to 10803 in feffe35

plotobj = self._bkgfitplot
if not recalc:
return plotobj
dataobj = self.get_bkg_plot(id, bkg_id, recalc=recalc)
# We don't use get_bkg_model_plot as that uses the ungrouped data
# modelobj = self.get_bkg_model_plot(id, bkg_id, recalc=recalc)
modelobj = self._bkgmodelplot
modelobj.prepare(self.get_bkg(id, bkg_id),
self.get_bkg_model(id, bkg_id),
self.get_stat())
plotobj.prepare(dataobj, modelobj)
return plotobj

which shows the different behavior in how the modelobj variable is set.

@DougBurke DougBurke marked this pull request as draft May 21, 2022 21:38
@DougBurke DougBurke changed the title UI consolidate plotobj handling WIP UI consolidate plotobj handling May 21, 2022
@codecov
Copy link

codecov bot commented May 21, 2022

Codecov Report

Merging #1516 (f1bd9f8) into main (300add2) will decrease coverage by 0.03%.
The diff coverage is 97.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1516      +/-   ##
==========================================
- Coverage   78.96%   78.93%   -0.04%     
==========================================
  Files          72       72              
  Lines       26723    26661      -62     
  Branches     4493     4493              
==========================================
- Hits        21101    21044      -57     
+ Misses       5426     5417       -9     
- Partials      196      200       +4     
Impacted Files Coverage Δ
sherpa/ui/utils.py 90.75% <95.69%> (-0.14%) ⬇️
sherpa/astro/ui/utils.py 91.40% <100.00%> (-0.17%) ⬇️
sherpa/_version.py 31.42% <0.00%> (+1.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 300add2...f1bd9f8. Read the comment docs.

sherpa/astro/ui/utils.py Outdated Show resolved Hide resolved
sherpa/astro/ui/utils.py Outdated Show resolved Hide resolved
sherpa/astro/ui/utils.py Outdated Show resolved Hide resolved
sherpa/astro/ui/utils.py Outdated Show resolved Hide resolved
@DougBurke DougBurke force-pushed the ui-consolidate-plotobj-handling branch 3 times, most recently from eb752ff to 2b22cff Compare June 1, 2022 17:15
@DougBurke DougBurke mentioned this pull request Jun 3, 2022
@DougBurke DougBurke force-pushed the ui-consolidate-plotobj-handling branch 4 times, most recently from 3826dad to 9868dae Compare June 10, 2022 19:24
Slightly-better description of the return value from get_data,
as well as explaining when the IdentifierErr is raised (it must
have been a copy-and-paste error).
@DougBurke DougBurke force-pushed the ui-consolidate-plotobj-handling branch 2 times, most recently from 14dd50e to b66ce10 Compare June 13, 2022 16:43
Try to improve the documentation around the _plot/contour_types and
_plot/contour_type_names dictionaries. This includes making it
explicit the relationship between the two. The relationship is
simple for the sherpa/ui code, but it is more-complex for the astro
case.

Add a comment to note that the "astrocompmodel" keyword maps to
"model_componentl" and not "model_component". This typo was added
in sherpa#906 and is not addressed here.
@DougBurke DougBurke force-pushed the ui-consolidate-plotobj-handling branch from b66ce10 to 627cae5 Compare June 13, 2022 18:50
Several changes:

- prefere data rather than d for the variable storing the
  current data object.

- if recalc is False in a get_fit_plot/contour call then we do
  not need to access the data and model plot/contour objects;
  this is a minor saving at best, and is more-useful pointing
  out the information flow.

- the data-access calls for the get_xxx_plot calls is a bit
  messy because we want to allow the call to succeed when no data
  has been loaded (in this case, using the "default" type for the
  plot, for those that support different plot objects for different
  inputs).

  In this commit we make the code path a bit-cleaner by

  1. explicitly separating out the recalc=False from True code
     paths (the latter forces the code to error out if there is no
     data set);

  2. move some of the code into a method; we don't need to sub-class
     it, and the code is not a lot, but it does make things a bit
     cleaner.

  There are also several calls in astro that have been re-worked to
  follow the same pattern just so that they all follow the same
  scheme (where appropriate).
This stops the ability of calling

    set_ylog("energy")
    set_ylinear("photon")

oto change the plots created by

    plot_energy_flux() / get_energy_flux_hist()
    plot_photon_flux() / get_photon_flux_hist()

This was never documented (at least in the Sherpa docstrings)
so is unlikely to be missed. The one case I can thinkg of would
be a user who has used either

    set_ylog()
    set_ylog("all")

and is then surprised the plot no-longer changes.

The current work-around for these users is to call the appropriate
get_xxx_flux_hist call and change the preference directly. For
example

    get_energy_flux_hist(recalc=False).histo_prefs["ylog"] = True

This may seem a minor change with no-obvious improvement for the
user, but

- the current behavior is not documented
- future plans to consolidate the plot() and set_xlog() set of
  commands are actually hard to do with the "energy" and "flux"
  labels as they do not match the rest of the system (e.g.
  there is no get_energy_plot() command)

It is better to make the energy and photon plots be handled like
we do the other "special" commands like plot_scatter() and
plot_trace().
Rather than use short names like "compsource", use the full name,
like "source_component" for the _plot_types dictionary. Since
this dictionary is used by set_xlog/... to select which plot
objects get changed, support is added for the "shortened"
names, by means of the new _plot_types_alias dictionary, which
just maps from the shortened to new names.

If someone uses the old name then a warning message is
created, telling them the correct label to use. For example:

>>> set_xlog("bkgfit")
WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead

This is done as a warning rather than a Python deprecated message
to make sure users see it.

This does not change the set of names that can not be used as
a dataset identifier. This means that

    compsource
    source_component
    bkgfit

are invalid identifier names, but

    bkg_fit

is still a valid name for an identifier.

Note that source_component and model_component were already
supported, so the new labels added here are (only for the astro
module):

    bkg_chisqr
    bkg_delchi
    bkg_fit
    bkg_model
    bkg_ratio
    bkg_resid
    bkg_source

Note that the documentation for set_xlog and other commands still
point to the plot function as the place that describes the labels
that can be used. This is technically correct, but using the
labels described in plot() will lead to a deprecated warning. The
documentation is not changed here as the handling of plot is about
to be changed (to use these new "longer" names), which will then
make the set_xlog/... and plot documentation consistent.
Move the checks for whether a plot or contour label is valid to
methods, and make use of them, in _set_plot_item and _multi_plot.

This essentially makes the labels accepted by set_xlog/... and
plot()/contour()

- accept the same names
- behave the same when a deprecated alias is used

The behavior is actually technically "by luck" than "by design",
and only happens because the dictionaries - _plot_types,
_plot_type_names, and _plot_types_alias - end up having the same
keys (the alias dictionary makes it a bit harder to see this).

* Example

The behavior is shown below - which includes seeing the new warning
messages (both from set_xlog, which came in a previous commit, and
the plot call, which is new):

>>> from sherpa.astro.ui import *
...
>>> import sys
>>> sys.tracebacklimit = 0

** CIAO 4.14 with bkgfit label

>>> set_default_id("bkgfit")
sherpa.utils.err.IdentifierErr: identifier 'bkgfit' is a reserved word
>>> set_xlog("bkgfit")
>>> plot("bkgfit")
sherpa.utils.err.IdentifierErr: data set bkg_fit has not been set

** CIAO 4.14 with bkg_fit label

>>> set_default_id("bkg_fit")
>>> set_xlog("bkg_fit")
sherpa.utils.err.PlotErr: Plot type 'bkg_fit' not found in ['data', 'model', 'source', 'fit', 'resid', 'ratio', 'delchi', 'chisqr', 'psf', 'kernel', 'compsource', 'compmodel', 'order', 'energy', 'photon', 'arf', 'bkg', 'bkgmodel', 'bkgfit', 'bkgsource', 'bkgratio', 'bkgresid', 'bkgdelchi', 'bkgchisqr']
>>> plot("bkg_fit")
KeyError: 'bkg_fit'

During handling of the above exception, another exception occurred:

sherpa.utils.err.ArgumentErr: 'bkg_fit' is not a valid plot type

** This commit with bkgfit label

Note that both uses create a WARNING message (not for set_default_id
but that already errors out; we could think about saying something
to indicate this but I do not think it is worth it):

>>> set_default_id("bkgfit")
sherpa.utils.err.IdentifierErr: identifier 'bkgfit' is a reserved word
>>> set_xlog("bkgfit")
WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead
>>> plot("bkgfit")
WARNING: The argument 'bkgfit' is deprecated and 'bkg_fit' should be used instead
sherpa.utils.err.IdentifierErr: data set 1 has not been set

** This commit with bkg_fit label

>>> set_default_id("bkg_fit")
sherpa.utils.err.IdentifierErr: identifier 'bkg_fit' is a reserved word
>>> set_xlog("bkg_fit")
>>> plot("bkg_fit")
sherpa.utils.err.IdentifierErr: data set 1 has not been set
In sherpa#1177 access to image objects from the Session class was changed
from direct access (e.g. self._xxx) to using a dictionary
(self._image_types). This commit does the same thing for the
contour objects: the objects are accessed from the _contour_types
dictionary rather than having a set of _datacontour/... fields.

For most users this will make no difference. It is only an issue
for code that directly accesses the Session class. However, these
are labelled private fields so they should be aware that things
could change.
In sherpa#1177 access to image objects from the Session class was changed
from direct access (e.g. self._xxx) to using a dictionary
(self._image_types). This commit does the same thing for the
plot objects: the objects are accessed from the _plot_types
dictionary rather than having a set of _data/... fields.

For most users this will make no difference. It is only an issue
for code that directly accesses the Session class. However, these
are labelled private fields so they should be aware that things
could change.

With this commit, images, contours, and plots now use the same
scheme. However, there are four differences between the
contour/image and plot handling:

1. we can have multiple objects associated with a plot type - at
   the moment the second element is for Data1DInt data and,
   for the astro version, the second element is for DataPHA data,
   and this makes the get_<label>_plot calls rather cumbersome;

2. the template-model handling is odd and doesn't quite follow
   this (see sherpa#1530);

3. the PHA "fit" plot also requires a different plot object to for
   the model than is provided by get_model_plot (for the fit we
   group the data but for the model we do not group it), and the
   access to these objects are currently via direct access to
   the session field (for the bkg fit plot), or just created
   on-the-fly for the "source" plot;

4. and some plot types - e.g. plot_scatter - still require direct
   access to a session field.
With the recent work to rationalize the labels used by
plot()/contour() and set_xlog()/..., we do not need the
_plot_type_names and _contour_type_names dictionaries. So
we can remove them.
Note the new behavior of the plot command, and as set_xlog/...
point to it, the behavior of those commands too.
# There is an argument to be made to have the "singleton"
# plots, such as "fit", storing the object directly, and not
# as a single-element list, but it was felt that having a
# consistent access pattern was cleaner.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, I see the benefit of keeping a uniform representation here (always use a list), but this can be discussed.

Earlier attempts at fixing this actually created a class to use here (e.g. create an instance for the value rather than a list, and this class then had support for the "default" value and adding extra elements), but I never found it satisfactory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review, just some ideas based on what I'm seeing now; I want to make comments before it's all done an too late to change. So, please don't waste your time answering my suggestions if you already tried that, or think it's stupid and won't work. That said, here are some thoughts:

This entire approach was (and still is, although it's better now) pretty convoluted and requires careful maintenance. When new plot types are added, the _plot_types need to be updated, there is the _plot_types and _contour_types and later possible _image_types and then a few others that don't really fall in this scheme.

I am wondering if more of this information should live in the Plot classes themselves, e.g.:

class RatioPlot(...):
    # would accept all Data1DInt and DataPA, but not generic Data1D
    # I know that that's not what RatioPlot does, but giving and example
    _accepted_data_types = (Data1DInt, DataPHA)  
    _plot_area = 'data'  # one of: data, contour, image
   _ plot_name = 'Ratio'

    ... real code here ...

(Obviously, all names are open to bike shedding.)

That way, information would not have to be maintained separately here, it would be intrinsic to the plot class. I think that's a better place to keep that information.

_get_plot_type would then search through all known plot classes

def _get_plotorcountour_type(plotarea, plotname, data):
    for p in plot_type_list:
        if ((p._plot_area == plotarea) and (p._plot_name == plotname) and 
           any([insinstance(data, allowed_class) for allows_class in p._accepted_data_dypes])):
           return p
    finally:
        raise Error

The implementation above could be refined to deal with cases were there is more than one match, e.g. one could select the "most specific" of all plot classes (if one of them accepts all Data1D and another only DataPHA, then one may wnat to select the latter, while my implementation would pick whatever is first in the list).

How do we get the plot_type_list? Either that's hand-curated, but a better approach would be to use a metaclass that puts stuff in a registry when it's imported like I do for the plotting backends in #1382. That's just a few lines of code in the Plot and Histogram base classes (could be a new class in the tree PLotHistroImageBase from which PLot and Histrogram inherit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just occurs to me that in practice

    _accepted_data_types = (Data1DInt, DataPHA)  
    _plot_area = 'data'  # one of: data, contour, image, other

would not be set for every plot type, but at one point in the inheritance tree and then apply to all derived classes with no further code change. For example Plot would get the _plot_area = 'data' and all derived classes would have that property and Contour would get _plot_area = 'contour' and we only need to pay attention to special cases with multiple inheritance.

I've overlooked something that needs to be special cased in the current scheme, then maybe it also has to be special-cased in the proposed scheme and we need an area "other".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been trying to get this change merged for several years, so I am somewhat opposed to making more large architectural fixes. This PR is itself a rather cut-down version of the latest approach where I had added extra logic.

Currently I do not believe there is any "nice and simple" way to indicate the mapping between data type and object to use - for example

  • source plot for DataPHA accepts lo/hi arguments unlike every-other plot [maybe we should add it to all plot types, but that's a separate discussion/issue/PR]
  • plot_fit/plot_bkg_fit does not use the underlying "model" case, again only for DataPHA
  • whatever is going on with template models

I've spent a lot of time trying to simplify this code and it has never ended in success: one conclusion is in fact automating the mapping between data-type and plot object logic doesn't really help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe our plots are just too specific for be generalized this way, but I wanted to at least ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the original question in the comment: I fully agree that a consistent form (here: always a list) is a good choice. I don't think alternatives have to be discussed in the comment - this is the interface we picked. There are many other options we did not pick and that's OK.

raise PlotErr("wrongtype", plottype, str(allowed))

def _check_contourtype(self, plottype):
"""Is this a valid contour type?"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two contour routines could just be done in-place, but they are used by _multi_plot so we need them (i.e. the design wants plot and contour versions of the _check_xxx and _get_xxx routines),


"""

return self._data.get(self._fix_id(id))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very-simple routine, and I feel a bit bad about adding it as a method, but it does (I believe) make the downstream code a bit cleared - i.e. we call self.get_data if we require the data to exist and self._get_data if the data need not exist.


# This is an internal routine so it is not expected to be
# called incorrectly, but make sure we catch any such problem.
#
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is added to mainly point out why the two raise ArgumentxxxErr calls are not covered by tests.

@DougBurke DougBurke changed the title WIP UI consolidate plotobj handling Consolidate plot and set_xlog handling Jun 15, 2022
@DougBurke DougBurke marked this pull request as ready for review June 15, 2022 14:42
# of valid identifiers. unlike the plot and contour cases, as
# there is
#
# - no image() call that acts like plot() or contour();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there will be - so maybe we should include it already to make sure nobody writes code that uses kays we won't allow later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the keys for contour and image are already included in the plot table, so it turns out that adding the contour keys into the "what is a valid identifier" doesn't currently add anything, but is done partly for documentation and just-in-case something changes.

# There is an argument to be made to have the "singleton"
# plots, such as "fit", storing the object directly, and not
# as a single-element list, but it was felt that having a
# consistent access pattern was cleaner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review, just some ideas based on what I'm seeing now; I want to make comments before it's all done an too late to change. So, please don't waste your time answering my suggestions if you already tried that, or think it's stupid and won't work. That said, here are some thoughts:

This entire approach was (and still is, although it's better now) pretty convoluted and requires careful maintenance. When new plot types are added, the _plot_types need to be updated, there is the _plot_types and _contour_types and later possible _image_types and then a few others that don't really fall in this scheme.

I am wondering if more of this information should live in the Plot classes themselves, e.g.:

class RatioPlot(...):
    # would accept all Data1DInt and DataPA, but not generic Data1D
    # I know that that's not what RatioPlot does, but giving and example
    _accepted_data_types = (Data1DInt, DataPHA)  
    _plot_area = 'data'  # one of: data, contour, image
   _ plot_name = 'Ratio'

    ... real code here ...

(Obviously, all names are open to bike shedding.)

That way, information would not have to be maintained separately here, it would be intrinsic to the plot class. I think that's a better place to keep that information.

_get_plot_type would then search through all known plot classes

def _get_plotorcountour_type(plotarea, plotname, data):
    for p in plot_type_list:
        if ((p._plot_area == plotarea) and (p._plot_name == plotname) and 
           any([insinstance(data, allowed_class) for allows_class in p._accepted_data_dypes])):
           return p
    finally:
        raise Error

The implementation above could be refined to deal with cases were there is more than one match, e.g. one could select the "most specific" of all plot classes (if one of them accepts all Data1D and another only DataPHA, then one may wnat to select the latter, while my implementation would pick whatever is first in the list).

How do we get the plot_type_list? Either that's hand-curated, but a better approach would be to use a metaclass that puts stuff in a registry when it's imported like I do for the plotting backends in #1382. That's just a few lines of code in the Plot and Histogram base classes (could be a new class in the tree PLotHistroImageBase from which PLot and Histrogram inherit).

Copy link
Contributor

@hamogu hamogu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making things better. Is it making things perfect? No. For that we might have to abandon the UI altogether... I think this is mostly ready, but I have a few small comments; "I thought about this and decided on the current version because of X" is good enough for an answer in most cases - I just wanted to make sure these are intentional decisions and not an oversight.

mdl = s.create_model_component("const1d", "mdl")
s.set_bkg_source(mdl)

s.plot(f"bkg_{label}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a side note: All those tests can be improved when we have the bokeh backend. Since bokeh serialized for json, it's very easy to do assert expected_label in plot_in_json_format, while checking e.g. matplotlib pngs on the pixel level is a bit of a pain (though other packages do it), so I expect that we can revise or add a number of tests to be more stringent and not just check that code executes without and error, but that the labels also look what we want - but not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did ask about perceptual hashes in pytest-mpl back in the day - matplotlib/pytest-mpl#21 - there's some discussion in matplotlib/pytest-mpl#146 which suggests they may have an implementation soon (matplotlib/pytest-mpl#150)

# There is an argument to be made to have the "singleton"
# plots, such as "fit", storing the object directly, and not
# as a single-element list, but it was felt that having a
# consistent access pattern was cleaner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the original question in the comment: I fully agree that a consistent form (here: always a list) is a good choice. I don't think alternatives have to be discussed in the comment - this is the interface we picked. There are many other options we did not pick and that's OK.

raise ArgumentTypeErr("intstr")

if _is_integer(id):
return id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do integers have to be special-cased? The checks use "in" with checks the keys in a dict. Dicts can have integer keys, so the "in" should work and (since all keys here are strings) tell us that it's an allowed name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly a style thing. Before this check we have id : Union[int, str] but after it we have id : str.

This means that I can theoretically say

def _check_plottype(self, id : str) -> nool:

but I needed to add explicit an assert isinstance(id, str) statement when using mypy in #1525 . However, this is experimental code that isn't going to get merged, and I may have been mis-understanding quite how typeguards work there

Each group consists of the name of the plot/contour, followed
by optional arguments. So

self._multi_plot(["data", "data", "org"], plotmeth="plot")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike this design, but I admit that this PR is not the place to change it. Every problem in computer science can be solved with one more level of indirection, they say.
Here self._multi_plot([("data", ), ("data", "org")], plotmeth="plot") would work wonders - no more guessing if something if a plot command name or the name of a dataset identifier, strong encouragement not to reply on the default (although it can be done with just a tuple of length 1, i.e. by typing an extra pair of () and a comma), no need ot maintain a list of forbidden identifiers etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note: this is a very-thin wrapper around plot and contour. You would have to change how those are called - i.e.

plot("data", "data", 2, ylog=True)

would need to be changed to whatever you would come up with. The _multi_plot call is basically converting the user input into this design - i.e. a sequence of "command name" followed by arguments - so if plot and contour are changed then we can simplify this. To be honest the saving isn't going to be huge in terms of the number of lines, but there are things you could think of doing to make it easier to name arguments to send - e.g.

plot(["data", "id", 1], ["data", "id", 2])

but

a) this is not the place
b) I'm not sure it's worth it.

Multiple arguments can be sent, but they rely on positional
ordering only, so

args = ["model_component", mdl, "model_component", 2, mdl]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, following my comment above, this would be [("model_component", mdl), ("model_component", 2, mdl)] with a clear separation which argument go into which call and leaving it to that call do do with the arguments whatever that call wants to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, this routine is parsing the plot and contour commands, so you can not make this change without breaking existing code.

Comment on lines +10567 to +10570
if recalc:
data = self.get_data(id)
else:
data = self._get_data(id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about the 15th time that I see those four lines of code. Does it make sense to pull them out as a separate function data = self._get_data_recalc(id, recalc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but

a) as mentioned, I have pared this PR back to it's essentials, and have had about 4 different ways of trying to reduce the number of lines and balancing that against making methods that do too much. I quite like the separation between get_data and _get_data, but moving the recalc logic into the check starts to worry me (nut would allow me to remove the _get_data call`

b) I haven't checked, but I don't think the number of times we do this is that huge, so is the space saving worth it (i.e. 4 times vs 40)?

@wmclaugh wmclaugh merged commit 3eeafca into sherpa:main Jul 6, 2022
@DougBurke DougBurke deleted the ui-consolidate-plotobj-handling branch July 6, 2022 14:34
@DougBurke DougBurke added this to Done in plot (UI) Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants