Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for DESCQA using the DM products #127

Open
fjaviersanchez opened this issue Jul 27, 2018 · 19 comments
Open

Tests for DESCQA using the DM products #127

fjaviersanchez opened this issue Jul 27, 2018 · 19 comments

Comments

@fjaviersanchez
Copy link
Contributor

fjaviersanchez commented Jul 27, 2018

DC2 validation test brainstorming by @rmjarvis and @fjaviersanchez:

Image level (@rmjarvis):

  • Check that the images contain some pixels above 10sigma level.

  • Calculate gain and read noise and compare with prediction.

  • Check masked (saturated) bits of the images.

  • Check masked (bad/dead) pixels -> PhoSim.

Catalog (visit level):

  • use stars and use PSFmag to compute the CheckAstroPhoto test. (Using standalone test check in DC2-production #259). Update 09/09/18: Done in standalone code.

  • size stars vs magnitude at different epochs should be flat (use HSM size/sdssShape). Use scatter plot for every single star. Update 09/09/18: Done in standalone code.

  • given a calexp, select a clean stellar sample, check the PSF on each location (position of the star) and check the stacked difference (low priority).

  • select a set of calexps and check that the input seeing is correlated with the size of the stars appearing in them: Update 09/10/18: Done in standalone code

  • DCR test: translate the shape of the star to get the shape on the zenith direction for a bunch of good stars, separate per band, and check this as a function of airmass.

  • DCR test: repeat that splitting the sample into redder and bluer stars.

Catalog (coadd level):

@rmandelb
Copy link

@fjaviersanchez @rmjarvis - this is a great start! I have a few questions and other suggestions:

  • Is there some metadata about the expected sky level that could/should be compared with the actual sky level? (Possibly an extension of your "Calculate gain and read noise and compare with prediction.")

  • for the coadd catalog, might it make sense to do some basic sanity test of the galaxy ellipticities? (p(|e|) and the e1 vs e2 plot should not have any really unusual features) Also should we do something like the HSC comparison in this notebook? These will not uniquely identify a problem, but seem like basic sanity checks that might uncover a whole range of problems, and we already have code for them.

  • "add the input N(m) and the output N(m), check ratio and see when they start to separate from each other" -> separately for both stars and galaxies?

  • For tests involving magnitudes, it seems like we will need a way to account for the extinction bug in Run 1.2p to enable fair comparison with inputs and with Run 1.2i? We know exactly how much extinction should have been applied in 1.2p but wasn't, so it seems like this should be doable, but I just wanted to flag the problem. (Or if we don't do it then we might have some more difficulty interpreting results, and can probably only use a subset of these tests.)

@yymao
Copy link
Member

yymao commented Jul 30, 2018

Also note that we have one tract of HSC XMM PDR1 that is available in the same format as the Run 1.1p coadd catalog via GCR, which mean we can run a DESCQA test on both Run 1.1p coadd and HSC XMM and see a side-by-side comparison. This can also be useful for diagnosis/validation.

@rmandelb
Copy link

Nice. If we are comparing quantities that depend on PSF size, then we would have to restrict the Run 1.1p to similar seeing size as the HSC XMM field, but as long as we do that, this could be interesting.

@fjaviersanchez
Copy link
Contributor Author

fjaviersanchez commented Jul 30, 2018

Is there some metadata about the expected sky level that could/should be compared with the actual sky level? (Possibly an extension of your "Calculate gain and read noise and compare with prediction.")

There's already a test in a PR that computes the median background level and can include the prediction by OpSim.

for the coadd catalog, might it make sense to do some basic sanity test of the galaxy ellipticities? (p(|e|) and the e1 vs e2 plot should not have any really unusual features) Also should we do something like the HSC comparison in this notebook? These will not uniquely identify a problem, but seem like basic sanity checks that might uncover a whole range of problems, and we already have code for them.

Sounds good!

"add the input N(m) and the output N(m), check ratio and see when they start to separate from each other" -> separately for both stars and galaxies?

In principle we weren't thinking about splitting the sample but I think that's a good idea. Thanks!

For tests involving magnitudes, it seems like we will need a way to account for the extinction bug in Run 1.2p to enable fair comparison with inputs and with Run 1.2i? We know exactly how much extinction should have been applied in 1.2p but wasn't, so it seems like this should be doable, but I just wanted to flag the problem. (Or if we don't do it then we might have some more difficulty interpreting results, and can probably only use a subset of these tests.)

Thanks! @danielsf @yymao, does the 1.2 reference catalog include the unextincted magnitudes or the extincted ones?

One option can be generating two true catalogs (one for 1.2i and the other for 1.2p). Another solution is to generate just one catalog but including a column with the unextincted magnitudes and another with the correct extinction. The third option is just to use the extincted magnitudes and see that the PhoSim outputs are brighter than the inputs. The latter approach can, however, mask other problems...

@yymao
Copy link
Member

yymao commented Jul 30, 2018

@fjaviersanchez you mean the truth catalog, right? The magnitudes in the truth catalog do not include extinction.

@fjaviersanchez
Copy link
Contributor Author

Thanks @yymao! I meant the 1.2 reference catalog because I thought that the truth catalog for 1.2 is not in place, is it? (I can only see the 1.1 truth catalog and the 1.2 reference catalog)

@yymao
Copy link
Member

yymao commented Jul 30, 2018

Ah, ok. I am not sure about the reference catalog. I would guess its magnitudes do not have extinction but @danielsf can confirm. However, I think we should generate truth catalog for Run 1.2 rather than using reference catalog for validation.

@rmjarvis
Copy link

Sorry, what is the distinction between reference and truth? I was thinking of the reference catalog as equivalent to a truth catalog.

@rmjarvis
Copy link

rmjarvis commented Jul 30, 2018

for the coadd catalog, might it make sense to do some basic sanity test of the galaxy ellipticities?

@rmandelb, we had intentionally avoided doing any tests of the galaxy shapes, since the PSF will complicate the interpretation, and I thought weird sub-populations (e.g. an excess at |e|=1) would more likely be a failure of the measurement code than a failure of the image simulations. So I was deferring careful tests of shapes to the WL group.

However, you are quite right that we should at least plot some very basic things like p(e) to make sure there isn't something very badly wrong with the shapes. Just we probably won't be able to turn any of them into proper null tests (my goal for as many of these as possible).

@yymao
Copy link
Member

yymao commented Jul 30, 2018

@rmjarvis Reference catalog contains simulated photometry and astrometry noises that are not present in the truth catalog. Also, reference catalog only goes down to a certain depth (e.g. Gaia depth). (see https://confluence.slac.stanford.edu/x/oJgHDg)

@rmandelb
Copy link

rmandelb commented Aug 1, 2018

@rmjarvis - I dithered over the question of p(|e|) or an e1 vs. e2 histogram (to look for weird orientation effects) for the same reason you mentioned, but I do think there are some useful sanity checks there. For example, we know that re-Gaussianization doesn't have a failure mode that should lead to a pileup at |e|=1, it should be a reasonably smooth distribution across that boundary (unphysical values can result from dividing two noisy quantities). Pileups at values like 0 or 1, or just plain crazy shapes, or a strong coherent direction in the e1 vs. e2 histogram, could actually tell us something about the sims.

@sethdigel
Copy link

Is there some metadata about the expected sky level that could/should be compared with the actual sky level? (Possibly an extension of your "Calculate gain and read noise and compare with prediction.")

There's already a test in a PR that computes the median background level and can include the prediction by OpSim.

Regarding get_predicted_bkg, the predicted sky brightness from OpSim is interesting to have, but phoSim has its own sky brightness model, so the agreement won't be perfect. That is, phoSim evaluates the sky brightness (as a function of wavelength) based on other OpSim metadata, like elevation of the observing direction, altitude of the Sun, etc. instead of somehow inferring it from the OpSim sky brightness. The phoSim and OpSim sky brightness certainly should be correlated at whatever wavelength or band the OpSim brightness corresponds to, but again, the agreement won't be perfect.

@fjaviersanchez
Copy link
Contributor Author

Thanks @sethdigel. Yes, that's a problem and I believe that the trick will be to have reasonable validation criteria (How different should we expect them to be? 20%? 30%?)

@sethdigel
Copy link

Good question. I'm not sure how to answer, but the scatter seems quite large (and it is probably dependent on band). In April I put together run 1.2p OpSim metadata with basic information from the log files for phoSim r-band runs, including the numbers of photons that phoSim reported that it generated. This is dominated by the sky brightness. Here is a quick plot (sorry it is not Python; I love Python, really, but pandas still seems user hostile to me).

vskybright_photons_1p2r

These were early runs and phoSim could have changed in some way relevant to sky brightness since then, but I was not finding vSkyBright (or filtSkyBrightness) to be a good predictor of how long a phoSim run would take.

A csv file with the run metadata and phoSim photon counts is here:
http://www.slac.stanford.edu/~digel/lsst/visit_params_1p2r_v2.csv
Entries for which the phoSim finished have success = 1.

@fjaviersanchez
Copy link
Contributor Author

fjaviersanchez commented Aug 3, 2018

Thanks for the plot and the data @sethdigel! Yes, the correlation is there but there are really big outliers (I wonder if those were exposures with only stellar sources?). Since we have a way to compute the median sky-level we can try to convert OpSim's values to counts (or the counts to magnitude) and set some arbitrary, but restrictive, tolerance that we can fine tune once we get more experience. At the end of the day, the test can flag the exposures that don't comply with the criteria set and we can inspect them. However, we don't want to have to inspect all of them.

Does this sound reasonable?

@sethdigel
Copy link

sethdigel commented Aug 3, 2018

Yes that sounds reasonable; working in terms of the medians of the e-images (if that is what you have in mind) sounds sensible for figuring out whether a given sensor visit is way off what you'd expect from the OpSim metadata.

Regarding the extreme outliers in the plot, I don't have an explanation, but the sensor visits clearly did have sky emission - Mie and Rayleigh scattering from the Moon, plus airglow - and also Zodiacal light). The log file for one of these is here: http://srs.slac.stanford.edu/Pipeline-II/exp/SRS/log.jsp?pi=50705671
PhoSim reports how many photons are due to which components; in this sensor visit only ~0.1% are attributed to 'Astro Objects'. The run was made with phoSim v.3.7.9.

@danielsf
Copy link

danielsf commented Aug 6, 2018

I'm not sure if this is still relevant (sorry; I was on vacation last week), but @fjaviersanchez asked if the reference catalog contained dust extinction:

Yes, it does. I created the reference catalog before we diagnosed the dust problem in PhoSim.

I will generate a truth catalog for Run 1.2 in the next few days.

@rmjarvis
Copy link

rmjarvis commented Aug 6, 2018

Great. Thanks, Scott!

@cwwalter
Copy link
Member

I think this the place to suggest sensor level tests?

There is a wide range of tree ring amplitude visible on sensors. See the work from @karpov-sv and all here:

https://github.com/LSSTDESC/imSim/wiki/tree_ring_validation

and I have seen this in the exposure checker.

@karpov-sv when you say "The simulated data have been generated for all 189 different imSim sensor configurations using analytic formulae for pixel area variations shown above." Did you actually run a full focal plane using imSim? Or does this mean you used the formula?

I think it be a nice check of the actual 1.2 imSim output to see that the maximum amplitude and all are reasonable. I think Serge could help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants