Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"gradient" effect on DIVAnd maps[🐞] #129

Open
qcazenave opened this issue Apr 14, 2023 · 20 comments
Open

"gradient" effect on DIVAnd maps[🐞] #129

qcazenave opened this issue Apr 14, 2023 · 20 comments

Comments

@qcazenave
Copy link

Describe the bug

image
image
image
image

** diva3d call **
@time info = diva3d( (lonr,latr,depthr,TS), (obslon,obslat,obsdepth,obstime), obsval, len, epsilon2, # epsilon2 = 1 (profile) or 10 (timeseries) outputfile[s], varname, bathname = bathfile, mask = mask, background = bkgd, #seasonal profile #plotres = plotres, transform = Anam.loglin(maxval), ncvarattrib = ncvarattrib, ncglobalattrib = metadata[s], surfextend = true, coeff_derivative2 = param_d2, #param_2d = [0,0,0] memtofit = 50 );

@jmbeckers
Copy link
Member

jmbeckers commented Apr 14, 2023

I do not think it is a bug. Both situations arise in regions where you basically extrapolate and where a gradient in the nearest data can lead the extrapolation into unrealistic values.

In theory the error field should allow to identify such regions, in particular the "exact" error field generally is higher in such regions than the approximations used in practice (but is expensive).

Did you try the logit transformation instead of the loglin ? The latter might be responsible to allowing larger values than expected.

Just a stupid question: the values you use in DIVAnd for 2 µmole is 2 or 2E-6 ? (that could play a role when data are transformed)

Related note or question to developers: in loglin is it implicitly assumed that the data have values around 1 so that we have anomalies around zero? Maybe we need to add a scaling parameter too ?

@qcazenave
Copy link
Author

Hi @jmbeckers ,
I agree, it is not really a bug but I was not sure about the label to use...
So, I haven't tried the logit transformation and the value I use for 2 umole is 2.

@qcazenave
Copy link
Author

Update : logit instead of loglin = no change at all.
Use of logit : transform = Anam.logit(min=minval,max=maxval)

@jmbeckers
Copy link
Member

that is strange. What are the values of minval and maxval ?

@qcazenave
Copy link
Author

minval = 0
maxval = 800 for DIN and 400 for silicate (in umole/l)

@jmbeckers
Copy link
Member

Isn't that very large ? What are the obs values in the deep region ?

@qcazenave
Copy link
Author

qcazenave commented Apr 18, 2023

yes, it is very large, I believe such high values can only be found in coastal areas, near river mouths.
In the deep region, obs values remain below 100 umol/l

@qcazenave
Copy link
Author

Is it too large ? Should I keep the boundaries to the values found in the deep region ?

@ctroupin
Copy link
Member

ctroupin commented May 3, 2023

I would say that ideally one should not have to restrict the values of the input data, except if they are obviously wrong, which doesn't seem to be the case here.

What can be done is to use the residuals (diff. between observation and analysis at the location of the obs.) to discard some of the data points when the residual values is too large.

@qcazenave
Copy link
Author

Yes, OK, but I don't understand how this can help with the effects presented above ?

@ctroupin
Copy link
Member

ctroupin commented May 4, 2023

Maybe by removing such data points you avoid the large gradients mentioned by @jmbeckers, so the extrapolation doesn't give so dramatic results.

@jmbeckers
Copy link
Member

I would say that ideally one should not have to restrict the values of the input data, except if they are obviously wrong, which doesn't seem to be the case here.

What can be done is to use the residuals (diff. between observation and analysis at the location of the obs.) to discard some of the data points when the residual values is too large.

I did not mean to "cut" the input values. The logit transformation constrains the analysis to remain between two bounds.
So if you know that there are never values larger than 100, logit with an upper value of 100 will make the analysis always fall below 100 (and also will probably reduce some gradients and hence avoid some larger extrapolations).

@qcazenave
Copy link
Author

Thank you for your answer @jmbeckers. Indead, it was not clear to me but I understand now.

@qcazenave
Copy link
Author

Hi, I would like to get back to the logit transformation constraint on the values with the "min" and "max" parameters : for the loglin transformation, there is also a "max" parameter. I assumed it referred to the maximum value that would be authorized in the retrived field but the result of the analysis does not confirm my assumption. So my question is : what does this "max" parameter in the loglin transformation refer to ?

@jmbeckers
Copy link
Member

Loglin:

Provide the following transform log(x + epsilon) (for x < t) and its inverse.
Beyond the threshold t (x ≥ t), the function is extended linearly in a
continuous way.

So for loglin beyond the max value a linear transformation is used. (and not a cutting/clipping, for that Logit is needed).

@qcazenave
Copy link
Author

OK, my mistake, I forgot to re-check that.
Thank you for your answer.

@qcazenave
Copy link
Author

With logit instead of loglin, I tried on chlorophyll-a data, using min=0mg/m2 and max=5mg/m3 and get the following error :
'''
ERROR: LoadError: DomainError with -2.0224948157370686:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
'''
On the same data, with loglin (using 5mg/m3 as threshold for the linear extension), no error.

@jmbeckers
Copy link
Member

It probably means you have data outside of that range (there is no sanity check in the function before doing the anamorphis). Can you check on the input data that they are in the range ?

@qcazenave
Copy link
Author

qcazenave commented May 12, 2023

Yes, I most likely have data outside of that range. I thought it would work nevertheless taking into account the data according to the range. I thought the logit transformation would only work on the analysis

@jmbeckers
Copy link
Member

I think it is more sound you decide yourself which data you retain or what to do with the data out of your desired range (you can decide to discard the data or clip).

Logit works on the analysis, but to do so, all data are transformed into the new domain, so if data fall outside, the transformation is not valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants