fpga benchmark support #14

jzhoulon · 2022-10-20T01:22:58Z

Currently, npbench seems only support cpu and gpu, is there any support for fpga? thanks

alexnick83 · 2022-10-21T18:24:44Z

There is no support for automatically compiling the DaCe versions for FPGA through the run_framework and run_benchmark scripts. However, the capability exists in DaCe if you have the necessary toolchains installed. I will check if we can add experimental support in NPBench and get back to you.

jzhoulon · 2022-10-27T15:16:21Z

thanks

jzhoulon · 2022-10-28T15:16:19Z

@alexnick83 is there any experimental code that I can reproduce some fpga performance data shown in the paper? THanks

alexnick83 · 2022-10-28T15:22:38Z

@alexnick83 is there any experimental code that I can reproduce some fpga performance data shown in the paper? THanks

Yes, apart from the paper's artifact, there are tests in the DaCe repository. In the paper, the samples under polybench were run. Note that the FPGA tests may have some new transformations compared to the paper, but I suppose you are looking for the latest developments.

jzhoulon · 2022-10-30T14:54:53Z

@alexnick83 thanks for the info, how ever, I tried to benchmark the test under polybench, it seems dace_cpu and dace_gpu is mush slower than numpy(8-10x slower), such as the following code(cholesky_test.py), I have precompile sdfg with sdfg.compile. do you have any suggestions? thanks very much

'''
if name == "main":

parser = argparse.ArgumentParser()
parser.add_argument("-t", "--target", default='cpu', choices=['cpu', 'gpu', 'fpga'], help='Target platform')

args = vars(parser.parse_args())
target = args["target"]
sdfg = None
if target == "cpu":
    sdfg = run_cholesky(dace.dtypes.DeviceType.CPU)
elif target == "gpu":
    sdfg = run_cholesky(dace.dtypes.DeviceType.GPU)
elif target == "fpga":
    sdfg = run_cholesky(dace.dtypes.DeviceType.FPGA)

N = sizes["medium"]
A = init_data(N)
gt_A = np.copy(A)
sdfg_binary=sdfg.compile()
start = time.time()
for i in range(10):
  sdfg_binary(A=A, N = N)
end = time.time()
print("acclerator ", target, " time is ", (end - start)*100, "ms")

start = time.time()
for i in range(10):
    ground_truth(N, gt_A)
end = time.time()
print("numpy time is ",(end-start)*100, "ms")

'''

alexnick83 · 2022-10-31T09:27:42Z

For performance runs on CPU and GPU, I would use NPBench and not the DaCe tests. Their purpose is to keep track of functional regressions in the auto-optimizer for parameters controlled by the CI (for example, use simplify when generating the initial SDFG or not). Furthermore, the tests use, by default, a very small dataset size to finish execution fast. Therefore, you may be measuring library overheads on some of them. Still, the CPU being 8-10x slower seems strange. From the latest NPBench data (latest master branch), DaCe CPU is 14.3x faster than NumPy, and GPU is 4.6x slower than NumPy, on the same hardware and dataset as in the paper, which matches more or less the results shown. Another thing to note is that you must have optimized BLAS libraries installed for CPU execution. For example, if a test has matrix multiplication, but DaCe cannot find MKL (or OpenBLAS), it will generate the equivalent of the naive algorithm, which will run painfully slow.

alexnick83 · 2022-10-31T09:46:04Z

I just ran the modified Cholesky test you posted on my main machine (i7 7700). This is what I got for CPU and different parameters:

automatic_simplication=False

acclerator  cpu  time is  2.161860466003418 ms
numpy time is  202.2195816040039 ms

automatic_simplication=False, OMP_NUM_THREADS=4

acclerator  cpu  time is  2.292203903198242 ms
numpy time is  137.15152740478516 ms

automatic_simplication=True

acclerator  cpu  time is  1.9000768661499023 ms
numpy time is  134.70430374145508 ms

automatic_simplication=True, OMP_NUM_THREADS=4

acclerator  cpu  time is  1.9898653030395508 ms
numpy time is  136.15012168884277 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fpga benchmark support #14

fpga benchmark support #14

jzhoulon commented Oct 20, 2022

alexnick83 commented Oct 21, 2022

jzhoulon commented Oct 27, 2022

jzhoulon commented Oct 28, 2022

alexnick83 commented Oct 28, 2022 •

edited

jzhoulon commented Oct 30, 2022

alexnick83 commented Oct 31, 2022 •

edited

alexnick83 commented Oct 31, 2022 •

edited

fpga benchmark support #14

fpga benchmark support #14

Comments

jzhoulon commented Oct 20, 2022

alexnick83 commented Oct 21, 2022

jzhoulon commented Oct 27, 2022

jzhoulon commented Oct 28, 2022

alexnick83 commented Oct 28, 2022 • edited

jzhoulon commented Oct 30, 2022

alexnick83 commented Oct 31, 2022 • edited

alexnick83 commented Oct 31, 2022 • edited

alexnick83 commented Oct 28, 2022 •

edited

alexnick83 commented Oct 31, 2022 •

edited

alexnick83 commented Oct 31, 2022 •

edited