Use number of clusters as an RFI detector #49

caseyjlaw · 2018-12-13T17:37:56Z

We sometimes find that RFI can trigger many detections and do not cluster well. They tend to be found with many different (l, m, DM, dt).
Could we use the number of clusters in a segment as a trigger for rejecting RFI? For example, we could set a parameter in the preferences (e.g., max_clusters) that is tested after clustering. If more than that many are found, then reject all candidates in the segment. Or potentially, one could just use that to trigger the generation of a single candidate plot, rather than all in the segment.

The text was updated successfully, but these errors were encountered:

caseyjlaw · 2018-12-13T17:38:34Z

@KshitijAggarwal Do you have an opinion on what would be a good value for a parameter like this?

KshitijAggarwal · 2018-12-18T13:14:06Z

yes, such a parameter could be useful, given that we are confident that clustering parameters are appropriate for that image size.

A value of a few hundred clusters should be high enough for RFI. I would still be a little cautious, as I have seen cases in which a particular combination of preferences would cause an injected transient to trigger hundreds of clusters, which could then be solved by increasing the clustering parameters.

I think it would be better to set such a max_clusters parameter to a high number say 500, and then to recluster candidates at a higher value of min_cluster_size (and repeat the process till the number of candidates fall below that threshold). Using this one can limit the number of plots generated from a single segment.

KshitijAggarwal · 2018-12-18T13:16:06Z

Typically one can observe a "knee" in the number of clusters vs min_cluster_size plot, as after a certain value of min_cluster_size, there is a sharp decrease in the number of clusters, but my understanding is that HDBSCAN is relatively robust as compared to other clustering algorithms like knn or fof.

caseyjlaw · 2018-12-18T18:55:21Z

Interesting.
So you suggest we cluster once and see how many clusters are found. If too many are found (e.g., a few hundred), then we cluster again with a larger min_cluster_size?
What do we learn if there are fewer clusters the second time? Does that help us understand if it is RFI or not?

KshitijAggarwal · 2018-12-19T14:24:56Z

Fewer clusters would demonstrate that the clustering has been done properly atleast on the real events, if any, in that data. This is primarily to avoid rejecting real events, which generated lots of clusters due to non optimal clustering parameters.
Also, I have noticed that once HDBSCAN has identified all the obvious clusters, it won't over cluster the candidates if the min_cluster_size is increased a little bit, so it is inherently a little robust to that.

caseyjlaw · 2018-12-19T14:40:30Z

If you can suggest the steps we can use to identify good/bad clustering, I could try it. So far, I am not too worried about this issue, since we are doing pretty well clustering RFI and good transients in the newest observations.

One important thing to remember is that we want to be sensitive to clusters with only a few candidates (e.g., 2 or 3). Be sure to include those kinds of simulated transients in your tests.

KshitijAggarwal · 2018-12-19T14:47:40Z

I will do some tests, and will let you know soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use number of clusters as an RFI detector #49

Use number of clusters as an RFI detector #49

caseyjlaw commented Dec 13, 2018

caseyjlaw commented Dec 13, 2018

KshitijAggarwal commented Dec 18, 2018

KshitijAggarwal commented Dec 18, 2018

caseyjlaw commented Dec 18, 2018

KshitijAggarwal commented Dec 19, 2018

caseyjlaw commented Dec 19, 2018 •

edited

KshitijAggarwal commented Dec 19, 2018

Use number of clusters as an RFI detector #49

Use number of clusters as an RFI detector #49

Comments

caseyjlaw commented Dec 13, 2018

caseyjlaw commented Dec 13, 2018

KshitijAggarwal commented Dec 18, 2018

KshitijAggarwal commented Dec 18, 2018

caseyjlaw commented Dec 18, 2018

KshitijAggarwal commented Dec 19, 2018

caseyjlaw commented Dec 19, 2018 • edited

KshitijAggarwal commented Dec 19, 2018

caseyjlaw commented Dec 19, 2018 •

edited