f32 is not infered on basic mult #1493

FlorianDeconinck · 2024-01-02T22:20:37Z

Consider this snippet:

import numpy as np
import dace


@dace.program
def dace_float_cast(A, B, timestep: dace.float32):
    dt2: dace.float32 = 2 * timestep
    A[:] = B[:] * dt2


A = np.zeros(10, dtype=np.float32)
B = np.zeros(10, dtype=np.float32)
t = np.float32(250.0)

dace_float_cast(A, B, t)

SDFG & code generated (below) show dt2 is consider a 64-bit float, despite the cast.

...
double dt2;
{
    float __in2 = timestep;
    double __out;

    ///////////////////
    // Tasklet code (_Mult_)
    __out = (dace::float64(2) * dace::float64(__in2));
    ///////////////////

    dt2 = __out;
}
...

SDFG (not a real zip, just going around GitHub) program.sdfg.zip

The text was updated successfully, but these errors were encountered:

FlorianDeconinck · 2024-01-02T22:28:35Z

A follow up bug is that even if dt2 is edited in SDFG to be a float32 and the SDFG is properly typed throughout, the generated code battles to be all float. E.g.:

{
        float dt2;

        {
            float __in2 = timestep;
            double __out;

            ///////////////////
            // Tasklet code (_Mult_)
            __out = (dace::float64(2) * dace::float64(__in2));
            ///////////////////

            dt2 = __out;
        }
        {
            #pragma omp parallel for
            for (auto __i0 = 0; __i0 < 10; __i0 += 1) {
                {
                    float __in1 = B[__i0];
                    double __in2 = *(double *)(&dt2);
                    float __out;

                    ///////////////////
                    // Tasklet code (_Mult_)
                    __out = (__in1 * dace::float32(__in2));
                    ///////////////////

                    A[__i0] = __out;
                }
            }
        }

    }

alexnick83 · 2024-01-03T00:12:24Z

The quick fix for anyone wanting to go ahead with a PR is to amend line 3289 in frontend/python/newast as follows:

                    elif (not result_data.transient or result in self.sdfg.constants_prop or
                          (dtype is not None and dtype != result_data.dtype)):

This will fix the "bug," but I expect it to be an unsatisfying solution, as it will generate an extraneous and seemingly "unremovable" (by the simplification pass) Tasklet that copies the float64 result of the operation 2 * timestep to the float32 variable dt2. However, TaskletFusion should do the trick (I didn't try it).

Fundamentally, there are two main issues. The first is that annotated assignments are not very compatible with the DaCe Python frontend. For simplicity, the frontend parses the RHS separately from the LHS. Therefore, the datatype of the RHS is decided by whatever Python/NumPy rules are coded in the frontend. If the datatype of the LHS is different, then we don't go back to change how the RHS is computed. Instead, we add another Tasklet.

The reason we don't go back is the second issue: whenever the datatypes of the operands do not agree, we do explicit conversion in the Tasklet code, which is a string and, therefore, not particularly easy to update. A very good example can be seen in the reported issue. In the multiplication 2 * timestep, the operands have types int64 and float32. NumPy tells us that the result is float64. To be sure that the result in the generated code will indeed be float64 and to avoid any incompatibilities (e.g., look at issues operating among complex floats and doubles in C++), we convert the data in the Tasklet code: dace.float64(2) * dace.float64(__in2).

It is not a quick fix, but we could amend the frontend to pass annotated assignment information to the visitor methods handling the RHS. The awkward conversions will still be there, but the compiler may be able to optimize them away. However, I think explicit conversion in the Tasklet code is a band-aid for the lack of a better design, as this is code-generation functionality. We should consider a better mechanism in the future.

Regarding the follow-up issue, I suspect the edited SDFG already had the connector datatypes set. In other words, the __in2 connector in the Mapped Tasklet was already inferred to be float64, hence the strange casting in the generated code. If you change the connector type (in the SDFG viewer, click on the Mapped Tasklet, and then find the in_connectors in its properties), it should work.

FlorianDeconinck · 2024-01-05T15:52:38Z

Between your explanation, shuffling code to be casted to np.float32 and #1494 I was able to tame it all.

I still believe this warrant further work but we have workable solution around. Right now, mixed precision is a required feature for weather and climate model, and it'll become even more important as the model run smaller and smaller resolutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

f32 is not infered on basic mult #1493

f32 is not infered on basic mult #1493

FlorianDeconinck commented Jan 2, 2024

FlorianDeconinck commented Jan 2, 2024

alexnick83 commented Jan 3, 2024

FlorianDeconinck commented Jan 5, 2024

f32 is not infered on basic mult #1493

f32 is not infered on basic mult #1493

Comments

FlorianDeconinck commented Jan 2, 2024

FlorianDeconinck commented Jan 2, 2024

alexnick83 commented Jan 3, 2024

FlorianDeconinck commented Jan 5, 2024