Fix for clipping and rounding issues #267

heptaflar · 2021-02-25T17:26:32Z

This pull request fixes (some) clipping and rounding issues due to (in my opinion) wrong implementations of setSample(int x, int y, double / float / int sample) in TileImpl. The original implementation did not

Clip the argument (or inverse-scaled argument) to the value range of the underlying data type as stated in the API documentation
Round floating point values to the nearest integer value

The first point leads to unpredictible behaviour (may be predictable for a Java software developer, but not for a scientist who wants to code an operator). The second points leads to a systematic bias when written data are read again.

For example, as a consequence of these issues, the collocation operator produces extremely high (and wrong) signals from almost zero signal for short data types when higher than bilinear interpolation (like bicubic interpolation) is requested.

Please review my changes. Note that they highlight the problems and are merely suggestions how to possibly solve them (in part).

SabineEmbacher · 2021-02-26T09:16:37Z

snap-gpf/src/main/java/org/esa/snap/core/gpf/internal/TileImpl.java

@@ -121,6 +121,17 @@ public double toGeoPhysical(double sample) {
        return rasterDataNode.scale(sample);
    }

+    private int toRaw(int sample) {
+        final double rawSample = rasterDataNode.scaleInverse(sample);
+        if (rawSample < -2147483648.0) {


Which miracle Number ist this?
Why not
if (rawSample < Integer.MIN_VALUE)

SabineEmbacher · 2021-02-26T09:17:17Z

snap-gpf/src/main/java/org/esa/snap/core/gpf/internal/TileImpl.java

+        if (rawSample < -2147483648.0) {
+            return Integer.MIN_VALUE;
+        }
+        if (rawSample > 2147483647.0) {


the same as above

SabineEmbacher · 2021-02-26T09:20:19Z

snap-gpf/src/main/java/org/esa/snap/core/gpf/internal/TileImpl.java

        }
        writableRaster.setSample(x, y, 0, sample);
    }

+    private int clipOrRound(int sample, int lowerBound, int upperBound) {


This is only a clip without round

heptaflar · 2021-02-26T09:25:57Z

Feel free to rename methods and extract constants as you like. I regard it as good programming style to compare variables of the same type only, but you may care about this or not. My only concern here is clipping and rounding.

SabineEmbacher · 2021-02-26T10:34:04Z

snap-gpf/src/main/java/org/esa/snap/core/gpf/internal/TileImpl.java

@@ -551,9 +605,33 @@ public void setSample(int x, int y, double sample) {
        if (scaled) {
            sample = toRaw(sample);
        }
+        switch (rasterDataNode.getDataType()) {


If raw data run into a clipping there must be something wrong with geopysical data because they run outside the covered valid value range, defined by raw data type and scaling. In this case clipping the raw data value then turns a wrong geophysical value into a valid value.
This can not be right.

Clipping or other types of handling values outside the range should be done at the place where the TileImpl is used, because only the context knows how to handle values outside the range.

Clipping then should be done that way:
First calculate the geopysical clipping bound by scaling the raw min and raw max value (take care if signed or unsigned), then clip the geopysical.
Up to here al computations outside of TileImpl.
Then transform to raw and round to prevent the cast error.

An integer cast is neither a floor nor a round nor a cail operation.
A cast on a positive double value is equal to a floor operation.
But on negative values it is equal to a cail operation.
It changes the direction of the value change for negative numbers.

I disagree.

A geophysical value that is outside the range of the raw data type is not invalid. It is just too large to be stored. It is better to clip it than to pass it uncontrolled.

Casting of a double or float to a short or byte yields rather weird results. Cast of a positive double or float to a short or byte is not equal to a floor operation. Test it yourself.

A scientist developing an operator probably does not want to bother with clipping and data types. Often the data type is predetermined (from external data) and not always the same.

The API documentation of setSample states that the values are clipped, which they are not.

A scientist's expectation probably is: he ensures that numbers (double or float) are calculated correctly. How these values are encoded into an image or a file and decoded again is business of the software framework. Encoding may not be lossless. But if something cannot be encoded without loss, the software framework should handle it in a smart and easily predictible way that is as good an approximation to the original information as possible.

A geophysical value that is outside the range of the raw data type is not invalid. It is just too large to be stored. It is better to clip it than to pass it uncontrolled.

Thats exactly what I meant.
Values are invalid from the point of view of "which data can be saved" not from the Point of view, does the value make sence from the scientific poit of view of valid data range.

If the value is in a scientific valid range, but can not be saved, then the configuration of the store must be wrong. Then either the raw data type or the scaling must be wrong.

If a value is a scientific valid value, a raw clipping changes this scientific valid value. Does this make sence?

Ideally a raw date store should not manipulate data. Also a raw data store shall not know anything of the valid scientific value range. A raw data store shall only be able to tore store all the scientivic valid data of a cerain use case. For this a raw data store has to be configured the right way. Then scientific valid data can be stored using the scaling but without manipulation.

SabineEmbacher

see my comments

SabineEmbacher · 2021-02-26T10:50:52Z

Clipping or other types of handling geophysical values outside the defined geopysical value range should be done at the place where the TileImpl is used, because only the context knows how to handle values outside the range.

TileImpl is not only used in CollocationOp but also in many other contexts.

SabineEmbacher · 2021-02-26T10:59:35Z

If new values are to be expected only slightly outside the valid range (defined by data type and scaling factors) because, for example, one has decided to use an interpolation type that is not a linear interpolation for a resampling, then the problem can also be circumvented by slightly reducing the scaling factor in the target raster. Then overshoots from the interpolation fit back into the raw data area.

heptaflar · 2021-02-26T11:31:45Z

Well, the API documentation states that values are clipped. But they are not. The purpose of setSample(x, y, double value) is to encode the value into the underlying data type, which may be different from double. Encoding involves clipping. So the API documentation makes complete sense to me.

Your suggestions work technically but do not solve the encoding / decoding problem for raster data types, which are given and whose properties must not be changed.

heptaflar · 2021-02-26T12:45:06Z

@Test
public void testCasts() {
    // all these tests fail, prior clipping is needed
    Assert.assertEquals(Short.MAX_VALUE, (short) 1.0E6);
    Assert.assertEquals(Short.MIN_VALUE, (short) -1.0E6);
    Assert.assertEquals(Byte.MAX_VALUE, (byte) 1.0E3);
    Assert.assertEquals(Byte.MIN_VALUE, (byte) -1.0E3);

    // these tests do not fail, no clipping is needed
    Assert.assertEquals(Integer.MAX_VALUE, (int) 1.0E27);
    Assert.assertEquals(Integer.MIN_VALUE, (int) -1.0E27);

    // this test fails, rounding is needed (otherwise encoding / decoding would produce a bias)
    Assert.assertEquals(2.0, (int) 1.999);
}

CLAassistant · 2023-06-14T08:40:09Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Fixes

03e2aea

heptaflar requested review from forman, marpet, SabineEmbacher, TomBlock and TonioF February 26, 2021 08:45

SabineEmbacher reviewed Feb 26, 2021

View reviewed changes

SabineEmbacher requested changes Feb 26, 2021

View reviewed changes

heptaflar added 3 commits March 2, 2021 11:25

New non-negative resampling type (decoration of existing types)

4120c32

Typo

5ba5a3f

Reverted module version numbers

4da85bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for clipping and rounding issues #267

Fix for clipping and rounding issues #267

heptaflar commented Feb 25, 2021 •

edited

SabineEmbacher Feb 26, 2021

SabineEmbacher Feb 26, 2021

SabineEmbacher Feb 26, 2021

heptaflar commented Feb 26, 2021

SabineEmbacher Feb 26, 2021 •

edited

heptaflar Feb 26, 2021 •

edited

SabineEmbacher Feb 26, 2021

SabineEmbacher left a comment

SabineEmbacher commented Feb 26, 2021

SabineEmbacher commented Feb 26, 2021

heptaflar commented Feb 26, 2021 •

edited

heptaflar commented Feb 26, 2021 •

edited

CLAassistant commented Jun 14, 2023

Fix for clipping and rounding issues #267

Are you sure you want to change the base?

Fix for clipping and rounding issues #267

Conversation

heptaflar commented Feb 25, 2021 • edited

SabineEmbacher Feb 26, 2021

Choose a reason for hiding this comment

SabineEmbacher Feb 26, 2021

Choose a reason for hiding this comment

SabineEmbacher Feb 26, 2021

Choose a reason for hiding this comment

heptaflar commented Feb 26, 2021

SabineEmbacher Feb 26, 2021 • edited

Choose a reason for hiding this comment

heptaflar Feb 26, 2021 • edited

Choose a reason for hiding this comment

SabineEmbacher Feb 26, 2021

Choose a reason for hiding this comment

SabineEmbacher left a comment

Choose a reason for hiding this comment

SabineEmbacher commented Feb 26, 2021

SabineEmbacher commented Feb 26, 2021

heptaflar commented Feb 26, 2021 • edited

heptaflar commented Feb 26, 2021 • edited

CLAassistant commented Jun 14, 2023

heptaflar commented Feb 25, 2021 •

edited

SabineEmbacher Feb 26, 2021 •

edited

heptaflar Feb 26, 2021 •

edited

heptaflar commented Feb 26, 2021 •

edited

heptaflar commented Feb 26, 2021 •

edited