Feature request: goodness-of-fit tests for copulas #376

extabl · 2018-10-07T08:21:37Z

It would be very nice to have feature, thinking of something like in the R copula package (or in gofCopula package) but with a much better performance, capable of handling inputs in the range of +500k.

Just as a note, by the way, it's a good work guys!!

tnagler · 2018-10-07T16:30:06Z

Not a bad idea considering their popularity (although I personally don't like GoF-test).

In any case, before we do this, we will first need derivatives in the library, which is a larger project. We have considered adding derivatives from the very beginning, but had other priorities and couldn't come up with a satisfying proof of concept. I'm quite sure this feature will come at some point and I opened an issue for further discussion (#377). However, I don't think this is going to happen soon.

extabl · 2018-10-07T17:19:21Z

Yes, sure. First things first, there's already quite much to come in this project. Keep going!

However, it makes me wonder why you don't like GoF-test? I've always thought that they are indispensable in statistical analysis of copulas.

tnagler · 2018-10-07T17:44:03Z

This has a somewhat philosophical reason. First of all, null-hypothesis significance testing (NHST) has many flaws, as summarized here or here. There are many recent papers on this subject, the American Statistical Association even issued a statement. I am not as strongly opposed as many of the prominent critics and use significance tests myself every now and then.

Regarding GoF-tests specifically, I think they are widely misused to solve problems they weren't designed for (especially model selection).

There is a valid use of GoF-tests though: If you really have the hypothesis that your parametric model is the one and only truth and you want to test for that. However, I have never seen GoF-tests used like this in the copula world and it's a rather odd hypothesis to have. Essentially, we already know that any parametric hypothesis is false. Then the GoF-test only tells you whether the sample size was large enough to detect this. I believe that this is rarely what people want to know.

Usually, the end goal is entirely different, like prediction or simulation. Predictions should be as accurate as possible and simulations as realistic as possible. For these goals it's pretty much irrelevant whether the selected copula is the one and only truth or not.

extabl · 2018-10-07T19:43:12Z

You certainly have a valid point regarding the questionable usefulness of GoF-tests in context of copulas.

I was not aware of these aspects, but that is good news as I can allocate more resources on another parts of the research (e.g. survival copulas). No concerns needed anymore on the compute performance of GoF-tests, which about 20 hours to calculate p-values on the sample bivariate data set of 500k observations (raw data set in the range of 25-36 m).. Definitely the code in the R copula package is not yet parallelized, as the CPUs have been hardly utilized ~ 6-7%. But as you wrote, in case of simulations, the goodness-of-fit is not the main point.

Thanks again for your above clarification! Vielen Dank!

tnagler added the enhancement label Oct 7, 2018

tnagler mentioned this issue Jun 9, 2022

Different results between pyvinecopulib and rvinecopulib (based on VineCopula) vinecopulib/pyvinecopulib#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: goodness-of-fit tests for copulas #376

Feature request: goodness-of-fit tests for copulas #376

extabl commented Oct 7, 2018

tnagler commented Oct 7, 2018

extabl commented Oct 7, 2018

tnagler commented Oct 7, 2018

extabl commented Oct 7, 2018

Feature request: goodness-of-fit tests for copulas #376

Feature request: goodness-of-fit tests for copulas #376

Comments

extabl commented Oct 7, 2018

tnagler commented Oct 7, 2018

extabl commented Oct 7, 2018

tnagler commented Oct 7, 2018

extabl commented Oct 7, 2018