Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TornadoVM stuck when handling the exceptions #240

Open
TaisZ opened this issue Apr 19, 2023 · 7 comments
Open

TornadoVM stuck when handling the exceptions #240

TaisZ opened this issue Apr 19, 2023 · 7 comments
Assignees
Labels
bug Something isn't working compiler

Comments

@TaisZ
Copy link
Contributor

TaisZ commented Apr 19, 2023

Describe the bug

When running the following code, TornadoVM cannot properly throw an exception or stop the program.

How To Reproduce

package uk.ac.manchester.tornado.examples.Broken;

import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
import uk.ac.manchester.tornado.api.TaskGraph;
import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;

import java.util.Arrays;

/**
 * Running do not stop, when add the line 18. DEADLOCK?
 */
public class TryCatch {
    public static void divided(int[] a, int[] b, float[] c) {
        try {
            for (@Parallel int i = 0; i < c.length; i++) {
                System.out.println(b[i]);
                c[i] = (float) a[i] /b[i];
            }
        }catch (Exception e){
            System.out.println("Got an error!");
        }

    }

    public static void main(String[] args) {

        final int numElements = 8;
        int[] a = new int[numElements];
        int[] b = new int[numElements];
        float[] c = new float[numElements];

        Arrays.fill(a, 5);
        Arrays.fill(b, 0);
        Arrays.fill(c, 0);

        TaskGraph taskGraph = new TaskGraph("s0") //
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b) //
                .task("t0", TryCatch::divided, a, b, c) //
                .transferToHost(DataTransferMode.EVERY_EXECUTION, c);

        ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
        TornadoExecutionPlan executor = new TornadoExecutionPlan(immutableTaskGraph);
        executor.execute();
    }
}

Expected behavior

Since there are no such mechanisms on GPUs that handle the exceptions, for example, on a division by 0. The TornadoVM may run the code on the CPU or give some error messages.

Computing system setup (please complete the following information):

  • OS: Ubuntu 22.4
  • Driver versions: Nvidia 530.30.02
  • OpenCL versions: 3.0
  • CUDA Driver versions: 12.1.68
  • TornadoVM commit id: 7a5e148
@TaisZ
Copy link
Contributor Author

TaisZ commented Apr 19, 2023

When I removed the line: System.out.println(b[i]);. The TornadoVM can stop the program properly but no output or error message.

@stratika
Copy link
Collaborator

Hi @TaisZ, I think your example is by default not supported. A method that will be offloaded for hardware acceleration (e.g. divided) cannot contain traps or exceptions, since this is an operation not supported by the hardware accelerators. See paragraph 4, in this document.

@jjfumero
Copy link
Member

This type of code is not supported in TornadoVM. However, TornadoVM should throw an exception. The problem now seems that the execution hangs and it gets stuck, probably during the initial part of the TornadoVM JIT Compiler.

@TaisZ , can you run with --debug and --fullDebug options and report the output you get? These will dump a trace of the internals of the JIT compiler and the runtime system.

@jjfumero jjfumero self-assigned this Apr 20, 2023
@jjfumero jjfumero added bug Something isn't working compiler labels Apr 20, 2023
@TaisZ
Copy link
Contributor Author

TaisZ commented Apr 20, 2023

@jjfumero The output:

$ tornado --jvm="-Dtornado.fullDebug=True"  --debug --threadInfo --printKernel -m tornado.examples/uk.ac.manchester.tornado.examples.Broken.TryCatch

WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Loading DRIVER: uk.ac.manchester.tornado.drivers.ptx.PTXTornadoDriverProvider@57f23557

@TaisZ
Copy link
Contributor Author

TaisZ commented Apr 20, 2023

@jjfumero However when I remove the line System.out.println(b[i]);, the TornadoVM can stop the program properly without exception.

$ tornado --jvm="-Dtornado.fullDebug=True"  --debug --threadInfo --printKernel -m tornado.examples/uk.ac.manchester.tornado.examples.Broken.TryCatch
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Loading DRIVER: uk.ac.manchester.tornado.drivers.ptx.PTXTornadoDriverProvider@57f23557
.version 7.6 
.target sm_86 
.address_size 64 

.visible .entry s0_t0_divided_int8_int8_float8(.param .u64 .ptr .global .align 8 kernel_context, .param .u64 .ptr .global .align 8 a, .param .u64 .ptr .global .align 8 b, .param .u64 .ptr .global .align 8 c) {
        .reg .pred rpb<2>;
        .reg .s64 rsd<4>;
        .reg .f32 rfi<4>;
        .reg .u32 rui<5>;
        .reg .s32 rsi<7>;
        .reg .u64 rud<9>;

BLOCK_0:
        ld.param.u64    rud0, [kernel_context];
        ld.param.u64    rud1, [a];
        ld.param.u64    rud2, [b];
        ld.param.u64    rud3, [c];
        mov.u32 rui0, %nctaid.x;
        mov.u32 rui1, %ntid.x;
        mul.wide.u32    rud4, rui0, rui1;
        cvt.s32.u64     rsi0, rud4;
        mov.u32 rui2, %tid.x;
        mov.u32 rui3, %ctaid.x;
        mad.lo.s32      rsi1, rui3, rui1, rui2;

BLOCK_1:
        mov.s32 rsi2, rsi1;
LOOP_COND_1:
        setp.lt.s32     rpb0, rsi2, 8;
        @!rpb0 bra      BLOCK_3;

BLOCK_2:
        cvt.s64.s32     rsd0, rsi2;
        shl.b64 rsd1, rsd0, 2;
        add.s64 rsd2, rsd1, 24;
        add.u64 rud5, rud1, rsd2;
        ld.global.s32   rsi3, [rud5];
        add.u64 rud6, rud2, rsd2;
        ld.global.s32   rsi4, [rud6];
        add.u64 rud7, rud3, rsd2;
        cvt.rn.f32.s32  rfi0, rsi3;
        cvt.rn.f32.s32  rfi1, rsi4;
        div.full.f32    rfi2, rfi0, rfi1;
        st.global.f32   [rud7], rfi2;
        add.s32 rsi5, rsi0, rsi2;
        mov.s32 rsi2, rsi5;
        bra.uni LOOP_COND_1;

BLOCK_3:
        ret;
}

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3080 GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]


@jjfumero
Copy link
Member

jjfumero commented Apr 20, 2023

@TaisZ , can you use the latest version from the master branch? We fixed the full debug mode in a recent commit.

@TaisZ
Copy link
Contributor Author

TaisZ commented Apr 20, 2023

Yes, all runs are based on the latest commit

$ tornado --fullDebug --debug -m tornado.examples/uk.ac.manchester.tornado.examples.Broken.TryCatch

WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Loading DRIVER: uk.ac.manchester.tornado.drivers.ptx.PTXTornadoDriverProvider@57f23557

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

3 participants