Tests for DESCQA using the DM products #127

fjaviersanchez · 2018-07-27T19:01:04Z

DC2 validation test brainstorming by @rmjarvis and @fjaviersanchez:

Image level (@rmjarvis):

Check that the images contain some pixels above 10sigma level.
Calculate gain and read noise and compare with prediction.
Check masked (saturated) bits of the images.
Check masked (bad/dead) pixels -> PhoSim.

Catalog (visit level):

use stars and use PSFmag to compute the CheckAstroPhoto test. (Using standalone test check in DC2-production #259). Update 09/09/18: Done in standalone code.
size stars vs magnitude at different epochs should be flat (use HSM size/sdssShape). Use scatter plot for every single star. Update 09/09/18: Done in standalone code.
given a calexp, select a clean stellar sample, check the PSF on each location (position of the star) and check the stacked difference (low priority).
select a set of calexps and check that the input seeing is correlated with the size of the stars appearing in them: Update 09/10/18: Done in standalone code
DCR test: translate the shape of the star to get the shape on the zenith direction for a bunch of good stars, separate per band, and check this as a function of airmass.
DCR test: repeat that splitting the sample into redder and bluer stars.

Catalog (coadd level):

~~Separate stars and galaxies and use them in CheckAstroPhoto~~.
In CheckAstroPhoto add the input N(m) and the output N(m), check ratio and see when they start to separate from each other (in progress, see here).
~~Check that galaxy density decreases with MW extinction (First commit of Density test #140)~~.
~~Check color-color diagram for input and output for several colors (inspect to validate)~~ -> (Update CheckColors test to be compatible with DM outputs #141)
Red sequence test (red sequence colors (mean, scatter) as a function of redshift #41 and red sequence validation test #101)
Add input-true size as a function of true size.
Count the number of objects around a central galaxies in a given aperture (1 arcmin) and represent that as a function of the cluster richness (something in the input???).

The text was updated successfully, but these errors were encountered:

rmandelb · 2018-07-30T00:37:09Z

@fjaviersanchez @rmjarvis - this is a great start! I have a few questions and other suggestions:

Is there some metadata about the expected sky level that could/should be compared with the actual sky level? (Possibly an extension of your "Calculate gain and read noise and compare with prediction.")
for the coadd catalog, might it make sense to do some basic sanity test of the galaxy ellipticities? (p(|e|) and the e1 vs e2 plot should not have any really unusual features) Also should we do something like the HSC comparison in this notebook? These will not uniquely identify a problem, but seem like basic sanity checks that might uncover a whole range of problems, and we already have code for them.
"add the input N(m) and the output N(m), check ratio and see when they start to separate from each other" -> separately for both stars and galaxies?
For tests involving magnitudes, it seems like we will need a way to account for the extinction bug in Run 1.2p to enable fair comparison with inputs and with Run 1.2i? We know exactly how much extinction should have been applied in 1.2p but wasn't, so it seems like this should be doable, but I just wanted to flag the problem. (Or if we don't do it then we might have some more difficulty interpreting results, and can probably only use a subset of these tests.)

yymao · 2018-07-30T01:15:58Z

Also note that we have one tract of HSC XMM PDR1 that is available in the same format as the Run 1.1p coadd catalog via GCR, which mean we can run a DESCQA test on both Run 1.1p coadd and HSC XMM and see a side-by-side comparison. This can also be useful for diagnosis/validation.

rmandelb · 2018-07-30T01:22:10Z

Nice. If we are comparing quantities that depend on PSF size, then we would have to restrict the Run 1.1p to similar seeing size as the HSC XMM field, but as long as we do that, this could be interesting.

fjaviersanchez · 2018-07-30T02:08:10Z

Is there some metadata about the expected sky level that could/should be compared with the actual sky level? (Possibly an extension of your "Calculate gain and read noise and compare with prediction.")

There's already a test in a PR that computes the median background level and can include the prediction by OpSim.

for the coadd catalog, might it make sense to do some basic sanity test of the galaxy ellipticities? (p(|e|) and the e1 vs e2 plot should not have any really unusual features) Also should we do something like the HSC comparison in this notebook? These will not uniquely identify a problem, but seem like basic sanity checks that might uncover a whole range of problems, and we already have code for them.

Sounds good!

"add the input N(m) and the output N(m), check ratio and see when they start to separate from each other" -> separately for both stars and galaxies?

In principle we weren't thinking about splitting the sample but I think that's a good idea. Thanks!

For tests involving magnitudes, it seems like we will need a way to account for the extinction bug in Run 1.2p to enable fair comparison with inputs and with Run 1.2i? We know exactly how much extinction should have been applied in 1.2p but wasn't, so it seems like this should be doable, but I just wanted to flag the problem. (Or if we don't do it then we might have some more difficulty interpreting results, and can probably only use a subset of these tests.)

Thanks! @danielsf @yymao, does the 1.2 reference catalog include the unextincted magnitudes or the extincted ones?

One option can be generating two true catalogs (one for 1.2i and the other for 1.2p). Another solution is to generate just one catalog but including a column with the unextincted magnitudes and another with the correct extinction. The third option is just to use the extincted magnitudes and see that the PhoSim outputs are brighter than the inputs. The latter approach can, however, mask other problems...

yymao · 2018-07-30T02:20:48Z

@fjaviersanchez you mean the truth catalog, right? The magnitudes in the truth catalog do not include extinction.

fjaviersanchez · 2018-07-30T02:24:09Z

Thanks @yymao! I meant the 1.2 reference catalog because I thought that the truth catalog for 1.2 is not in place, is it? (I can only see the 1.1 truth catalog and the 1.2 reference catalog)

yymao · 2018-07-30T02:29:31Z

Ah, ok. I am not sure about the reference catalog. I would guess its magnitudes do not have extinction but @danielsf can confirm. However, I think we should generate truth catalog for Run 1.2 rather than using reference catalog for validation.

rmjarvis · 2018-07-30T03:59:01Z

Sorry, what is the distinction between reference and truth? I was thinking of the reference catalog as equivalent to a truth catalog.

rmjarvis · 2018-07-30T04:03:53Z

for the coadd catalog, might it make sense to do some basic sanity test of the galaxy ellipticities?

@rmandelb, we had intentionally avoided doing any tests of the galaxy shapes, since the PSF will complicate the interpretation, and I thought weird sub-populations (e.g. an excess at |e|=1) would more likely be a failure of the measurement code than a failure of the image simulations. So I was deferring careful tests of shapes to the WL group.

However, you are quite right that we should at least plot some very basic things like p(e) to make sure there isn't something very badly wrong with the shapes. Just we probably won't be able to turn any of them into proper null tests (my goal for as many of these as possible).

yymao · 2018-07-30T07:12:08Z

@rmjarvis Reference catalog contains simulated photometry and astrometry noises that are not present in the truth catalog. Also, reference catalog only goes down to a certain depth (e.g. Gaia depth). (see https://confluence.slac.stanford.edu/x/oJgHDg)

rmandelb · 2018-08-01T13:24:10Z

@rmjarvis - I dithered over the question of p(|e|) or an e1 vs. e2 histogram (to look for weird orientation effects) for the same reason you mentioned, but I do think there are some useful sanity checks there. For example, we know that re-Gaussianization doesn't have a failure mode that should lead to a pileup at |e|=1, it should be a reasonably smooth distribution across that boundary (unphysical values can result from dividing two noisy quantities). Pileups at values like 0 or 1, or just plain crazy shapes, or a strong coherent direction in the e1 vs. e2 histogram, could actually tell us something about the sims.

sethdigel · 2018-08-03T06:52:57Z

Is there some metadata about the expected sky level that could/should be compared with the actual sky level? (Possibly an extension of your "Calculate gain and read noise and compare with prediction.")

There's already a test in a PR that computes the median background level and can include the prediction by OpSim.

Regarding get_predicted_bkg, the predicted sky brightness from OpSim is interesting to have, but phoSim has its own sky brightness model, so the agreement won't be perfect. That is, phoSim evaluates the sky brightness (as a function of wavelength) based on other OpSim metadata, like elevation of the observing direction, altitude of the Sun, etc. instead of somehow inferring it from the OpSim sky brightness. The phoSim and OpSim sky brightness certainly should be correlated at whatever wavelength or band the OpSim brightness corresponds to, but again, the agreement won't be perfect.

fjaviersanchez · 2018-08-03T15:14:45Z

Thanks @sethdigel. Yes, that's a problem and I believe that the trick will be to have reasonable validation criteria (How different should we expect them to be? 20%? 30%?)

sethdigel · 2018-08-03T16:24:14Z

Good question. I'm not sure how to answer, but the scatter seems quite large (and it is probably dependent on band). In April I put together run 1.2p OpSim metadata with basic information from the log files for phoSim r-band runs, including the numbers of photons that phoSim reported that it generated. This is dominated by the sky brightness. Here is a quick plot (sorry it is not Python; I love Python, really, but pandas still seems user hostile to me).

These were early runs and phoSim could have changed in some way relevant to sky brightness since then, but I was not finding vSkyBright (or filtSkyBrightness) to be a good predictor of how long a phoSim run would take.

A csv file with the run metadata and phoSim photon counts is here:
http://www.slac.stanford.edu/~digel/lsst/visit_params_1p2r_v2.csv
Entries for which the phoSim finished have success = 1.

fjaviersanchez · 2018-08-03T16:45:50Z

Thanks for the plot and the data @sethdigel! Yes, the correlation is there but there are really big outliers (I wonder if those were exposures with only stellar sources?). Since we have a way to compute the median sky-level we can try to convert OpSim's values to counts (or the counts to magnitude) and set some arbitrary, but restrictive, tolerance that we can fine tune once we get more experience. At the end of the day, the test can flag the exposures that don't comply with the criteria set and we can inspect them. However, we don't want to have to inspect all of them.

Does this sound reasonable?

sethdigel · 2018-08-03T18:12:45Z

Yes that sounds reasonable; working in terms of the medians of the e-images (if that is what you have in mind) sounds sensible for figuring out whether a given sensor visit is way off what you'd expect from the OpSim metadata.

Regarding the extreme outliers in the plot, I don't have an explanation, but the sensor visits clearly did have sky emission - Mie and Rayleigh scattering from the Moon, plus airglow - and also Zodiacal light). The log file for one of these is here: http://srs.slac.stanford.edu/Pipeline-II/exp/SRS/log.jsp?pi=50705671
PhoSim reports how many photons are due to which components; in this sensor visit only ~0.1% are attributed to 'Astro Objects'. The run was made with phoSim v.3.7.9.

danielsf · 2018-08-06T19:50:06Z

I'm not sure if this is still relevant (sorry; I was on vacation last week), but @fjaviersanchez asked if the reference catalog contained dust extinction:

Yes, it does. I created the reference catalog before we diagnosed the dust problem in PhoSim.

I will generate a truth catalog for Run 1.2 in the next few days.

rmjarvis · 2018-08-06T19:52:58Z

Great. Thanks, Scott!

cwwalter · 2018-08-24T17:07:58Z

I think this the place to suggest sensor level tests?

There is a wide range of tree ring amplitude visible on sensors. See the work from @karpov-sv and all here:

https://github.com/LSSTDESC/imSim/wiki/tree_ring_validation

and I have seen this in the exposure checker.

@karpov-sv when you say "The simulated data have been generated for all 189 different imSim sensor configurations using analytic formulae for pixel area variations shown above." Did you actually run a full focal plane using imSim? Or does this mean you used the formula?

I think it be a nice check of the actual 1.2 imSim output to see that the maximum amplitude and all are reasonable. I think Serge could help with this.

yymao added the validation: post-DM label Jul 27, 2018

rmandelb mentioned this issue Jul 29, 2018

A branch or a new repo for sharing DC2 analysis notebooks? LSSTDESC/DC2-production#238

Closed

fjaviersanchez mentioned this issue Aug 21, 2018

Validation checklist of 1.2p and 1.2i as preparation for Run 2.0 LSSTDESC/DC2-production#254

Closed

10 tasks

fjaviersanchez mentioned this issue Aug 31, 2018

Tree ring test #151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for DESCQA using the DM products #127

Tests for DESCQA using the DM products #127

fjaviersanchez commented Jul 27, 2018 •

edited

rmandelb commented Jul 30, 2018

yymao commented Jul 30, 2018

rmandelb commented Jul 30, 2018

fjaviersanchez commented Jul 30, 2018 •

edited

yymao commented Jul 30, 2018

fjaviersanchez commented Jul 30, 2018

yymao commented Jul 30, 2018

rmjarvis commented Jul 30, 2018

rmjarvis commented Jul 30, 2018 •

edited

yymao commented Jul 30, 2018

rmandelb commented Aug 1, 2018

sethdigel commented Aug 3, 2018

fjaviersanchez commented Aug 3, 2018

sethdigel commented Aug 3, 2018

fjaviersanchez commented Aug 3, 2018 •

edited

sethdigel commented Aug 3, 2018 •

edited

danielsf commented Aug 6, 2018

rmjarvis commented Aug 6, 2018

cwwalter commented Aug 24, 2018

Tests for DESCQA using the DM products #127

Tests for DESCQA using the DM products #127

Comments

fjaviersanchez commented Jul 27, 2018 • edited

rmandelb commented Jul 30, 2018

yymao commented Jul 30, 2018

rmandelb commented Jul 30, 2018

fjaviersanchez commented Jul 30, 2018 • edited

yymao commented Jul 30, 2018

fjaviersanchez commented Jul 30, 2018

yymao commented Jul 30, 2018

rmjarvis commented Jul 30, 2018

rmjarvis commented Jul 30, 2018 • edited

yymao commented Jul 30, 2018

rmandelb commented Aug 1, 2018

sethdigel commented Aug 3, 2018

fjaviersanchez commented Aug 3, 2018

sethdigel commented Aug 3, 2018

fjaviersanchez commented Aug 3, 2018 • edited

sethdigel commented Aug 3, 2018 • edited

danielsf commented Aug 6, 2018

rmjarvis commented Aug 6, 2018

cwwalter commented Aug 24, 2018

fjaviersanchez commented Jul 27, 2018 •

edited

fjaviersanchez commented Jul 30, 2018 •

edited

rmjarvis commented Jul 30, 2018 •

edited

fjaviersanchez commented Aug 3, 2018 •

edited

sethdigel commented Aug 3, 2018 •

edited