Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message "Unsupported HVX type: float32x32" #8170

Open
jxl1080 opened this issue Apr 2, 2024 · 7 comments
Open

Error message "Unsupported HVX type: float32x32" #8170

jxl1080 opened this issue Apr 2, 2024 · 7 comments

Comments

@jxl1080
Copy link

jxl1080 commented Apr 2, 2024

Hi,

I got the error message below when I run my generator with Adams2019 auto-scheduler, it doesn't happen if I run my generator without any auto-scheduler. But I don't really understand what it tells me and what I should do:

Unhandled exception: Internal Error at /glnxa64/Halide/src/HexagonOptimize.cpp:105 triggered by user code at : Unsupported HVX type: float32x32

Below is my Halide Generator Class:

#include "Halide.h"
#include <stdio.h>
#include
using namespace Halide;
class mMatmul_matmul_out1_fcn_halide_generator : public Halide::Generator <mMatmul_matmul_out1_fcn_halide_generator> {

public:
    Input<Buffer<float>> B1{"B1", 2};
    Input<Buffer<float>> A1{"A1", 2};
    Output<Buffer<float>> matmul_out1_fcn{"matmul_out1_fcn", 2};

    void generate() {
        RDom r(0, 100);
        matmul_out1(d1, d2) = sum(A1(d1, r) * B1(r, d2));
        matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);
    }

    void schedule() {
    // Schedule is determined by autoscheduler. Need to set estimate on buffer
        if(using_autoscheduler()) {
            B1.dim(1).set_estimate(0, 100);
            B1.dim(0).set_estimate(0, 100);
            A1.dim(1).set_estimate(0, 100);
            A1.dim(0).set_estimate(0, 100);
            matmul_out1_fcn.set_estimate(d1, 0, 100).set_estimate(d2, 0, 100);
        }  else {
            // Default schedule
        }
    }

private:
    Var d1{"d1"};
    Var d2{"d2"};
    Func matmul_out1{"matmul_out1"};

};
HALIDE_REGISTER_GENERATOR(mMatmul_matmul_out1_fcn_halide_generator, mMatmul_matmul_out1_fcn_halide_gen)

Thank you!

@abadams
Copy link
Member

abadams commented Apr 2, 2024

The error means you're trying to compile to hvx, but your pipeline uses vectorized floats. I think our hexagon backend doesn't support the newer versions of hvx that support float vectors.

I think it isn't triggering without the autoscheduler, because then the schedule uses scalar floats only, which is fine. The autoscheduler isn't aware of that restriction on hexagon so it's trying to just vectorize everything.

@jxl1080
Copy link
Author

jxl1080 commented Apr 2, 2024

@abadams Thank you so much for your quick reply! Is there any suggestion on how to resolve this error message?

The error means you're trying to compile to hvx, but your pipeline uses vectorized floats. I think our hexagon backend doesn't support the newer versions of hvx that support float vectors.

I think it isn't triggering without the autoscheduler, because then the schedule uses scalar floats only, which is fine. The autoscheduler isn't aware of that restriction on hexagon so it's trying to just vectorize everything.

@abadams
Copy link
Member

abadams commented Apr 2, 2024

Don't try to do a floating point matrix multiply on hexagon. (Or at least the versions of hvx that Halide supports). It's not a good processor for running that algorithm, because you can't vectorize it. Do a fixed-point matrix multiply instead.

@jxl1080
Copy link
Author

jxl1080 commented Apr 2, 2024

@abadams Hi Adams, I'm not sure if I misunderstood your point by 'not try to do a floating point matrix multiply'. I changed my data type to 'uint8_t', but I'm getting a worse situation when I run my generator with Adams2019. There is a segmentation fault but without any error message.

@abadams
Copy link
Member

abadams commented Apr 2, 2024

Can you share a repro that crashes (including the build commands you're using)?

@jxl1080
Copy link
Author

jxl1080 commented Apr 3, 2024

Can you share a repro that crashes (including the build commands you're using)?

@abadams Thank you for your help! Below is the code of my Halide Generator Class:

#include "Halide.h"
#include <stdio.h>
#include
using namespace Halide;
class mMatmul_matmul_out1_fcn_halide_generator : public Halide::Generator <mMatmul_matmul_out1_fcn_halide_generator> {

public:
    Input<Buffer<uint8_t>> B1{"B1", 2};
    Input<Buffer<uint8_t>> A1{"A1", 2};
    Output<Buffer<uint16_t>> matmul_out1_fcn{"matmul_out1_fcn", 2};

    void generate() {
        RDom r(0, 100);
        matmul_out1(d1, d2) = sum(cast<uint16_t>(A1(d1, r))*cast<uint16_t>(B1(r, d2)));
        matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);
    }

    void schedule() {
    // Schedule is determined by autoscheduler. Need to set estimate on buffer
        if(using_autoscheduler()) {
            B1.dim(1).set_estimate(0, 100);
            B1.dim(0).set_estimate(0, 100);
            A1.dim(1).set_estimate(0, 100);
            A1.dim(0).set_estimate(0, 100);
            matmul_out1_fcn.set_estimate(d1, 0, 100).set_estimate(d2, 0, 100);
        }  else {
            // Default schedule
        }
    }

private:
    Var d1{"d1"};
    Var d2{"d2"};
    Func matmul_out1{"matmul_out1"};

};
HALIDE_REGISTER_GENERATOR(mMatmul_matmul_out1_fcn_halide_generator, mMatmul_matmul_out1_fcn_halide_gen)

I used binary 'Halide-17.0.1-x86-64-linux-52541176253e74467dabc42eeee63d9a62c199f6.tar.gz' downloaded from: https://github.com/halide/Halide/releases

My command for compiling the Halide Genertor Class is:
$ g++ mMatmul_matmul_out1_fcn_halide.cpp -std=c++17 ....../Halide-17.0.1-x86-64-linux/share/Halide/tools/GenGen.cpp -L ....../Halide-17.0.1-x86-64-linux/lib -lHalide -I ....../Halide-17.0.1-x86-64-linux/include -o mMatmul_matmul_out1_fcn_halide

My command for running generator with Adams2019 is (which gave me segmentation fault):
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:....../Halide-17.0.1-x86-64-linux/lib
$ ./mMatmul_matmul_out1_fcn_halide -f myPipeline -g mMatmul_matmul_out1_fcn_halide_gen -e h,o target=hexagon-32-noos-hvx-no_runtime autoscheduler.parallelism=2 autoscheduler=Adams2019 -p ....../Halide-17.0.1-x86-64-linux/lib/libautoschedule_adams2019.so -o ./

My command for running generator with no auto-scheduler (which worked for me):
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:....../Halide-17.0.1-x86-64-linux/lib
$ ./mMatmul_matmul_out1_fcn_halide -f myPipeline -g mMatmul_matmul_out1_fcn_halide_gen -e h,o target=hexagon-32-noos-hvx-no_runtime -o ./

@abadams
Copy link
Member

abadams commented Apr 11, 2024

Looks like it's a compiler bug caused by the adams autoscheduler not really understanding what to do on hexagon, and producing some very strange code that then hit a corner case bug in the simplifier.

Let's use the human Adams autoscheduler instead. A reasonable schedule for this pipeline is:

matmul_out1_fcn.vectorize(d1, 128).parallel(d2, (B1.dim(1).extent() + 3) / 4);

but a more typical matmul schedule (for large matrices) is

   void generate() {
        RDom r(0, 100);
        // Note: changed from sum to += so that I can schedule the reduction var
        matmul_out1(d1, d2) += cast<uint16_t>(A1(d1, r)) * cast<uint16_t>(B1(r, d2));
        matmul_out1_fcn(d1, d2) = matmul_out1(d1, d2);

        Var d1i, d2i, d1o, d2o;
        matmul_out1_fcn.tile(d1, d2, d1o, d2o, d1i, d2i, 3 * 128, 4).vectorize(d1i, 128).unroll(d1i).unroll(d2i).parallel(d2o);
        matmul_out1.compute_at(matmul_out1_fcn, d1o).vectorize(d1, 128).unroll(d1).unroll(d2);
        matmul_out1.update().reorder(d1, d2, r).vectorize(d1, 128).unroll(d1).unroll(d2);
    }

I usually do my scheduling inside the generate() method. In this case I needed to to access the RDom. You could also make the RDom a class member instead of a local.

For a great schedule, you need to start worrying about things like managing dmas into Hexagon's cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants