[fix] Problem with FPGA execution for multiple tasks and the default scheduler #416
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR provides a fix for the issue described in #401.
Note: This PR is tested on Intel Emulation mode. I do not have access to Xilinx FPGA to test it.
Problem description
There are two identified problems:
OCLCodeCache
class we have a method that checks if force compilation has been triggered, and the FPGA compilers for Intel are triggered to compile only if the check is true. This seems to have been an old check that we had from the time we had thelookupbuffer
kernel, and we were waiting till allLAUNCH
bytecodes that corresponds to all task indices (all tasks within aTaskGraph
) are issued, in order to trigger theforceCompilation()
method from theTornadoVM
class. See here.executor.withDefaultScheduler()
configuration in theExecutionPlan
seems to break the execution and results in OpenCL error (CL_INVALID_WORK_GROUP_SIZE
) when theclEnqueueNDRangeKernel
function is invoked.To fix the first problem, I removed the
shouldCompile
check that existed inOCLCodeCache
. To my understanding this is an old check, and it is not required since we deprecated thelookupbuffer
kernel.To fix the second problem, I performed a short refactoring in the
OCLKernelScheduler
(i.e., an abstract class) and theOCLFPGAScheduler
which extends the abstract class, to assess the default scheduling local work group for FPGAs when theexecutor.withDefaultScheduler()
is enabled in aTornadoExecutionPlan
.This change made me think of testing also to run the BlurFilter example with a WorkerGrid, and applied a small update in the
OCLGridInfo
to check the default FPGA local work group.Backend/s tested
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?
Then, you can run, as described also in the issue #401:
rm -rf fpga-source-comp tornado --debug --threadInfo --jvm="-Dblur.red.device=0:3 -Dblur.green.device=0:3 -Dblur.blue.device=0:3 -Dtornado.recover.bailout=False" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.BlurFilter
Output:
You can download, apply the patch and build TornadoVM:
and then run the same example:
rm -rf fpga-source-comp tornado --debug --threadInfo --jvm="-Dblur.red.device=0:3 -Dblur.green.device=0:3 -Dblur.blue.device=0:3 -Dtornado.recover.bailout=False" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.BlurFilter
Output:
MultipleTasks
example that runs two kernels on the FPGA.You can download, apply the patch and build TornadoVM:
and then run the same example:
rm -rf fpga-source-comp tornado --threadInfo --jvm="-Dexample.foo.device=0:3 -Dexample.bar.device=0:3" -m tornado.examples/uk.ac.manchester.tornado.examples.MultipleTasks