Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for clipping and rounding issues #267

Open
wants to merge 4 commits into
base: 8.x
Choose a base branch
from
Open

Conversation

heptaflar
Copy link
Contributor

@heptaflar heptaflar commented Feb 25, 2021

This pull request fixes (some) clipping and rounding issues due to (in my opinion) wrong implementations of setSample(int x, int y, double / float / int sample) in TileImpl. The original implementation did not

  • Clip the argument (or inverse-scaled argument) to the value range of the underlying data type as stated in the API documentation
  • Round floating point values to the nearest integer value

The first point leads to unpredictible behaviour (may be predictable for a Java software developer, but not for a scientist who wants to code an operator). The second points leads to a systematic bias when written data are read again.

For example, as a consequence of these issues, the collocation operator produces extremely high (and wrong) signals from almost zero signal for short data types when higher than bilinear interpolation (like bicubic interpolation) is requested.

Please review my changes. Note that they highlight the problems and are merely suggestions how to possibly solve them (in part).

@@ -121,6 +121,17 @@ public double toGeoPhysical(double sample) {
return rasterDataNode.scale(sample);
}

private int toRaw(int sample) {
final double rawSample = rasterDataNode.scaleInverse(sample);
if (rawSample < -2147483648.0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which miracle Number ist this?
Why not
if (rawSample < Integer.MIN_VALUE)

if (rawSample < -2147483648.0) {
return Integer.MIN_VALUE;
}
if (rawSample > 2147483647.0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as above

}
writableRaster.setSample(x, y, 0, sample);
}

private int clipOrRound(int sample, int lowerBound, int upperBound) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a clip without round

@heptaflar
Copy link
Contributor Author

Feel free to rename methods and extract constants as you like. I regard it as good programming style to compare variables of the same type only, but you may care about this or not. My only concern here is clipping and rounding.

@@ -551,9 +605,33 @@ public void setSample(int x, int y, double sample) {
if (scaled) {
sample = toRaw(sample);
}
switch (rasterDataNode.getDataType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If raw data run into a clipping there must be something wrong with geopysical data because they run outside the covered valid value range, defined by raw data type and scaling. In this case clipping the raw data value then turns a wrong geophysical value into a valid value.
This can not be right.

Clipping or other types of handling values outside the range should be done at the place where the TileImpl is used, because only the context knows how to handle values outside the range.

Clipping then should be done that way:
First calculate the geopysical clipping bound by scaling the raw min and raw max value (take care if signed or unsigned), then clip the geopysical.
Up to here al computations outside of TileImpl.
Then transform to raw and round to prevent the cast error.

An integer cast is neither a floor nor a round nor a cail operation.
A cast on a positive double value is equal to a floor operation.
But on negative values it is equal to a cail operation.
It changes the direction of the value change for negative numbers.

Copy link
Contributor Author

@heptaflar heptaflar Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree.

  1. A geophysical value that is outside the range of the raw data type is not invalid. It is just too large to be stored. It is better to clip it than to pass it uncontrolled.
  2. Casting of a double or float to a short or byte yields rather weird results. Cast of a positive double or float to a short or byte is not equal to a floor operation. Test it yourself.
  3. A scientist developing an operator probably does not want to bother with clipping and data types. Often the data type is predetermined (from external data) and not always the same.
  4. The API documentation of setSample states that the values are clipped, which they are not.
  5. A scientist's expectation probably is: he ensures that numbers (double or float) are calculated correctly. How these values are encoded into an image or a file and decoded again is business of the software framework. Encoding may not be lossless. But if something cannot be encoded without loss, the software framework should handle it in a smart and easily predictible way that is as good an approximation to the original information as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A geophysical value that is outside the range of the raw data type is not invalid. It is just too large to be stored. It is better to clip it than to pass it uncontrolled.

Thats exactly what I meant.
Values are invalid from the point of view of "which data can be saved" not from the Point of view, does the value make sence from the scientific poit of view of valid data range.

If the value is in a scientific valid range, but can not be saved, then the configuration of the store must be wrong. Then either the raw data type or the scaling must be wrong.

If a value is a scientific valid value, a raw clipping changes this scientific valid value. Does this make sence?

Ideally a raw date store should not manipulate data. Also a raw data store shall not know anything of the valid scientific value range. A raw data store shall only be able to tore store all the scientivic valid data of a cerain use case. For this a raw data store has to be configured the right way. Then scientific valid data can be stored using the scaling but without manipulation.

Copy link
Contributor

@SabineEmbacher SabineEmbacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

@SabineEmbacher
Copy link
Contributor

Clipping or other types of handling geophysical values outside the defined geopysical value range should be done at the place where the TileImpl is used, because only the context knows how to handle values outside the range.

TileImpl is not only used in CollocationOp but also in many other contexts.

@SabineEmbacher
Copy link
Contributor

If new values are to be expected only slightly outside the valid range (defined by data type and scaling factors) because, for example, one has decided to use an interpolation type that is not a linear interpolation for a resampling, then the problem can also be circumvented by slightly reducing the scaling factor in the target raster. Then overshoots from the interpolation fit back into the raw data area.

@heptaflar
Copy link
Contributor Author

heptaflar commented Feb 26, 2021

Well, the API documentation states that values are clipped. But they are not. The purpose of setSample(x, y, double value) is to encode the value into the underlying data type, which may be different from double. Encoding involves clipping. So the API documentation makes complete sense to me.

Your suggestions work technically but do not solve the encoding / decoding problem for raster data types, which are given and whose properties must not be changed.

@heptaflar
Copy link
Contributor Author

heptaflar commented Feb 26, 2021

@Test
public void testCasts() {
    // all these tests fail, prior clipping is needed
    Assert.assertEquals(Short.MAX_VALUE, (short) 1.0E6);
    Assert.assertEquals(Short.MIN_VALUE, (short) -1.0E6);
    Assert.assertEquals(Byte.MAX_VALUE, (byte) 1.0E3);
    Assert.assertEquals(Byte.MIN_VALUE, (byte) -1.0E3);

    // these tests do not fail, no clipping is needed
    Assert.assertEquals(Integer.MAX_VALUE, (int) 1.0E27);
    Assert.assertEquals(Integer.MIN_VALUE, (int) -1.0E27);

    // this test fails, rounding is needed (otherwise encoding / decoding would produce a bias)
    Assert.assertEquals(2.0, (int) 1.999);
}

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants