Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AXI4-Lite support #185

Open
sei-jgwohlbier opened this issue Jul 14, 2023 · 44 comments
Open

AXI4-Lite support #185

sei-jgwohlbier opened this issue Jul 14, 2023 · 44 comments
Labels

Comments

@sei-jgwohlbier
Copy link

sei-jgwohlbier commented Jul 14, 2023

Hi,
Is AXI4-Lite support in a state where it can be used? I tried to enable it with

panda_USE_EXPERIMENTAL=yes ../configure --enable-flopoco --enable-opt --prefix=/opt/panda-exp

but the build fails with

bash: /working_dir/PandA-bambu/etc/macros/../../ext/trng-4.17/configure: No such file or directory
configure: error: "Error in trng configuration"

Maybe I need to be on the feature/AXI branch? Or maybe I shouldn't be trying to test it.
Thanks.

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 14, 2023

Hello,

Currently, only the standard AXI4 master interface is available, AXI4-Lite is not there yet.
If you would like to contribute and implement that kind of interface, you are welcome to do so by branching from dev/panda.

Just as a note, the feature/AXI branch is there to fix some issues with the AXI caches, but it is not related to the AXI protocol support itself.
Also, about the experimental flag that you enabled for the compilation, that is an old option that is being removed right now since it does not bring anything new to the tool with respect to the standard, and it is impossible to compile with that (as you discovered yourself).

@sei-jgwohlbier
Copy link
Author

Thanks. So the AXI4 master interface is available in the standard release, and I don't need to do a special build?

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 14, 2023

Exactly.
Also, if you like to use the AppImage version, I suggest you download it from Bambu Releases. The latest stable version is bambu-2023.1.AppImage, while I suggest you go directly with bambu-dev-panda.AppImage to avoid some silly issues that have been solved in the latest builds.
Furthermore, the dev/panda build offers a new testbench environment supporting C/C++ testbench implementation.

@sei-jgwohlbier
Copy link
Author

Ok, thanks. I'm thinking about importing synthesized accelerators into an SoC that has AXI4 master for DMA's but needs either AXI4-Lite or APB for configuration of the accelerator. I can't yet tell if bambu provides this.

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 14, 2023

With the latest dev/panda version, it is possible to generate a memory-mapped top-level interface passing the --memory-mapped-top option. In this case, the top module will expose a slave memory bus which you may use to initialize the accelerator and start the computation. The available protocols for this interface are the Wishbone B4 protocol and the internal memory bus protocol used by Bambu. The latter is a straightforward protocol that you may adapt to AXI4-Lite if this suits your needs.

@sei-jgwohlbier
Copy link
Author

Ok, thanks very much. I see the build issue with fileIO.hpp. I have a patch for it, which I'm sure you also have.

@sei-jgwohlbier
Copy link
Author

Hi, I tried to add --memory-mapped-top to the soda-opt pytorch tutorial and I get the following error:

error -> unexpected case (unsigned char)  __exp_bits_23853_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:315 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))

Any idea where to start looking?
Thanks!

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 19, 2023

Hi, can you please provide the input files and the full command line you used to call Bambu?

@sei-jgwohlbier
Copy link
Author

sei-jgwohlbier commented Jul 19, 2023

The source code is from the pytorch tutorial in soda-opt. The only change to the Makefile that lowers to llvm is the addition of --memory-mapped-top to the bambu invocation, which results in the following commands.

ToyCNN(
  (conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml  ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG12 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--generate-tb=../../forward_kernel_test.xml \
	--simulate --simulator=VERILATOR \
	--top-fname=forward_kernel \
                --memory-mapped-top \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
completed
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG12 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --top-fname=forward_kernel --memory-mapped-top ../../../output/05_llvm_baseline.ll

I'm attaching the final output of soda 05_llvm_baseline.ll.

Thanks!

@wohlbier
Copy link

With the latest dev/panda version, it is possible to generate a memory-mapped top-level interface passing the --memory-mapped-top option. In this case, the top module will expose a slave memory bus which you may use to initialize the accelerator and start the computation. The available protocols for this interface are the Wishbone B4 protocol and the internal memory bus protocol used by Bambu. The latter is a straightforward protocol that you may adapt to AXI4-Lite if this suits your needs.

Is there any documentation on this interface?

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 24, 2023

The source code is from the pytorch tutorial in soda-opt. The only change to the Makefile that lowers to llvm is the addition of --memory-mapped-top to the bambu invocation, which results in the following commands.

ToyCNN(
  (conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml  ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG12 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--generate-tb=../../forward_kernel_test.xml \
	--simulate --simulator=VERILATOR \
	--top-fname=forward_kernel \
                --memory-mapped-top \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
completed
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG12 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --top-fname=forward_kernel --memory-mapped-top ../../../output/05_llvm_baseline.ll

I'm attaching the final output of soda 05_llvm_baseline.ll.

Thanks!

I tried to perform the synthesis with the provided command line and input description, but I had no issues with that. Which version of bambu are you using? I tried that with this AppImage.

@sei-jgwohlbier
Copy link
Author

sei-jgwohlbier commented Jul 24, 2023 via email

@sei-jgwohlbier
Copy link
Author

diff --git a/src/utility/fileIO.hpp b/src/utility/fileIO.hpp
index 43cb3d010..5058e9572 100644
--- a/src/utility/fileIO.hpp
+++ b/src/utility/fileIO.hpp
@@ -321,7 +321,7 @@ inline void CopyFile(boost::filesystem::path file_source, boost::filesystem::pat
    }
    else
    {
-      boost::filesystem::copy_file(file_source, file_target, boost::filesystem::copy_options::overwrite_existing);
+      boost::filesystem::copy_file(file_source, file_target, boost::filesystem::copy_option::overwrite_if_exists);
    }
 }

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 24, 2023

With the latest dev/panda version, it is possible to generate a memory-mapped top-level interface passing the --memory-mapped-top option. In this case, the top module will expose a slave memory bus which you may use to initialize the accelerator and start the computation. The available protocols for this interface are the Wishbone B4 protocol and the internal memory bus protocol used by Bambu. The latter is a straightforward protocol that you may adapt to AXI4-Lite if this suits your needs.

Is there any documentation on this interface?

The minimal interface needs to be documented elsewhere; I will add something to the wiki as soon as possible. Until then, here is a short description that may be helpful to you.
The default Bambu memory interface, the minimal interface, comprises seven signals: two inputs and five outputs. The default configuration yields a pipelined memory interface. Thus, each request will be asserted for a signal cycle. A non-pipelined version is also available and may be enabled through a memory controller module parameter.
Read and write requests are asserted using Mout_oe_ram and Mout_we_ram signals, respectively. The memory address is passed using the Mout_addr_ram bus, along with data bitsize (Mout_data_ram_size), and writes data (Mout_Wdata_ram) for write requests. The M_DataRdy input signal is expected to be asserted when read/write requests have been completed by the slave. The M_Rdata_ram bus must contain the read data in the same cycle the M_DataRdy is asserted.
Here is an example of the expected waveform for write and read transactions. The write transaction is completed in the same cycle it is issued in, and the read transaction is completed in the next cycle after the Mout_oe_ram signal has been asserted.
image

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 24, 2023

I built the dev/panda branch. I had to apply the following patch to get the code to compile. Hopefully this isn't the cause of the error.

I do not think that the patch is causing any issues.
Can you please add the --no-clean option to your command line and share the panda-temp/<input_filename>.gimplePSSA file that is generated? That is the IR dump generated starting from the frontend compiler IR (the clang-12 compiler in your case).
Also, can you share the clang-12 version string (the output of "clang-12 --version" command)?

@sei-jgwohlbier
Copy link
Author

Thanks for the reply!

$ clang-12 --version
Ubuntu clang version 12.0.0-3ubuntu1~20.04.5
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

I didn't mention previously that I have to edit the IR that comes out of soda-opt. Without editing it, bambu fails to ingest it, with this error.

/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/../../../output/05_llvm_baseline.ll:1327:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
                                                        ^
1 error generated.
Error in compilation
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/../../../output/05_llvm_baseline.ll:1327:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
                                                        ^
1 error generated.
error -> Front-end compiler returns an error during compilation 2

I remove the memory(argmem: readwrite) part of the attribute to get to the IR file that I sent previously. Using the resulting ll file I get the following output and the gimplePSSA.zip file attached.

warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
WARNING: this target does not support the llvm.stacksave intrinsic.
!! Unknown ext. calls:
memrefCopy
1 warning generated.
 (in-process)  /usr/local/include  /usr/lib/llvm-12/lib/clang/12.0.0/include  /usr/include/x86_64-linux-gnu  /usr/include  warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
WARNING: this target does not support the llvm.stacksave intrinsic.
!! Unknown ext. calls:
memrefCopy
1 warning generated.

error -> unexpected case (unsigned char)  __exp_bits_27944_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:315 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))

Please report bugs to <panda-info@polimi.it>

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 25, 2023

I am going to check the .gimplePSSA as soon as I can. In the meanwhile, just a note on the soda-generated IR, it may have been computed using a newer version of the LLVM toolchain with respect to the Clang 12 version, thus this may cause issues with the parser. Bambu also supports Clang 16 as a fronted, maybe you can avoid the .ll editing if you use that one as a frontend compiler.

@sei-jgwohlbier
Copy link
Author

Ok, thanks. I am working from the latest soda provided docker image, but I can go ahead and build soda and bambu with 16.

@Ansaya
Copy link
Collaborator

Ansaya commented Jul 25, 2023

The dev-panda AppImage is shipped with Clang 16 too, if you want to avoid the build.

@sei-jgwohlbier
Copy link
Author

Ok, thanks. Since I have to rebuild soda it's not much more trouble to build everything.

@sei-jgwohlbier
Copy link
Author

sei-jgwohlbier commented Jul 28, 2023

I managed to use the appimage bambu with clang16 specified so that I don't need to edit the IR. It fails differently than it previously did. Below is the whole output.

make synth-baseline
python torchscript.py output/01_tosa.mlir --dialect=tosa
ToyCNN(
  (conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml  ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--generate-tb=../../forward_kernel_test.xml \
	--simulate --simulator=VERILATOR \
	--top-fname=forward_kernel \
                --memory-mapped-top \
                --no-clean \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --top-fname=forward_kernel --memory-mapped-top --no-clean ../../../output/05_llvm_baseline.ll 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.06 - Revision 8dad23e15331c7737e7969dfa4a4f652d043934f-dev/panda

Parameters parsed in 0.08 seconds

Target technology = FPGA
Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 10
  - combinational: 0
  - others: 10

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_FU
  Total cells    : 8
  - combinational: 0
  - others: 8

Library Name     : STD_FU
  Total cells    : 56
  - combinational: 0
  - others: 56

Library Name     : STD_FU
  Total cells    : 1
  - combinational: 0
  - others: 1

Library Name     : CS_COMPONENT
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 0
  - combinational: 0
  - others: 0

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 21
  - combinational: 0
  - others: 21

Library Name     : STD
  Total cells    : 14
  - combinational: 0
  - others: 14

Library Name     : STD_COMMON
  Total cells    : 57
  - combinational: 0
  - others: 57

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_PC
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_SOFT_FLOAT
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD
  Total cells    : 95
  - combinational: 0
  - others: 95

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 9
  - combinational: 0
  - others: 9

Library Name     : WBWrapper
  Total cells    : 12
  - combinational: 0
  - others: 12

Available devices:
 - 5CSEMA5F31C6
 - 5SGXEA7N2F45C1
 - EP2C70F896C6
 - EP2C70F896C6-R
 - EP4SGX530KH40C2
 - LFE335EA8FN484C
 - LFE5U85F8BG756C
 - LFE5UM85F8BG756C
 - asap7-BC
 - asap7-TC
 - asap7-WC
 - nangate45
 - nx1h140tsp
 - nx1h35S
 - nx2h540tsc
 - xc4vlx100-10ff1513
 - xc5vlx110t-1ff1136
 - xc5vlx330t-2ff1738
 - xc5vlx50-3ff1153
 - xc6vlx240t-1ff1156
 - xc7a100t-1csg324-VVD
 - xc7vx330t-1ffg1157
 - xc7vx485t-2ffg1761-VVD
 - xc7vx690t-3ffg1930-VVD
 - xc7z020-1clg484
 - xc7z020-1clg484-VVD
 - xc7z020-1clg484-YOSYS-VVD
 - xc7z045-2ffg900-VVD
 - xcku060-3ffva1156-VVD
 - xcu280-2Lfsvh2892-VVD
Library Name     : STD_FU
  Total cells    : 3931
  - combinational: 0
  - others: 3931

warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
 (in-process)  /usr/local/include  /usr/include/x86_64-linux-gnu  /usr/include  warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.

  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: plus_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: cond_expr optimized, nbits = 3
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: cond_expr optimized, nbits = 3
  Bit Value Opt: cond_expr optimized, nbits = 4
  Bit Value Opt: cond_expr optimized, nbits = 5
  Bit Value Opt: bit_and_expr optimized, nbits = 1
  Bit Value Opt: ne_expr optimized, nbits = 1
  Bit Value Opt: bit_xor_expr optimized, nbits = 2
  Bit Value Opt: bit_and_expr optimized, nbits = 2
  Bit Value Opt: ne_expr optimized, nbits = 2
  Bit Value Opt: plus_expr optimized, nbits = 2
  Bit Value Opt: bit_and_expr optimized, nbits = 11
  Bit Value Opt: eq_expr optimized, nbits = 11
  Bit Value Opt: bit_and_expr optimized, nbits = 19
  Bit Value Opt: eq_expr optimized, nbits = 19
  Bit Value Opt: bit_and_expr optimized, nbits = 23
  Bit Value Opt: eq_expr optimized, nbits = 23
  Bit Value Opt: bit_and_expr optimized, nbits = 25
  Bit Value Opt: eq_expr optimized, nbits = 25
  Bit Value Opt: bit_and_expr optimized, nbits = 26
  Bit Value Opt: eq_expr optimized, nbits = 26
  Bit Value Opt: bit_and_expr optimized, nbits = 26
  Bit Value Opt: ne_expr optimized, nbits = 26
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: bit_and_expr optimized, nbits = 1
  Bit Value Opt: ne_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: bit_and_expr optimized, nbits = 22
  Bit Value Opt: ne_expr optimized, nbits = 22
  Bit Value Opt: bit_and_expr optimized, nbits = 47
  Bit Value Opt: ne_expr optimized, nbits = 47
  Bit Value Opt: bit_and_expr optimized, nbits = 9
  Bit Value Opt: ne_expr optimized, nbits = 9
  Bit Value Opt: bit_and_expr optimized, nbits = 32
  Bit Value Opt: ne_expr optimized, nbits = 32
  Bit Value Opt: plus_expr optimized, nbits = 4
  Bit Value Opt: plus_expr optimized, nbits = 4
  Bit Value Opt: plus_expr optimized, nbits = 4

  Memory allocation information:
  Sparse memory alignemnt set to 1024 bytes
    Function: forward_kernel
    Id: 495627
    Base Address: 1024
    Size: 1
    Parameter P0 of Function forward_kernel
      Id: 495628
      Base Address: 1040
      Size: 4
    Parameter P1 of Function forward_kernel
      Id: 495629
      Base Address: 1056
      Size: 4
    Parameter P2 of Function forward_kernel
      Id: 495630
      Base Address: 1072
      Size: 4
Warning: This function uses unknown addresses: forward_kernel
    BRAM bitsize: 16
    Spec may not exploit DATA bus width
    Spec accesses data having an address unknown at compile time
    Internal data is not externally accessible
    DATA bus bitsize: 32
    ADDRESS bus bitsize: 32
    SIZE bus bitsize: 6
    Total amount of memory allocated for memory mapped parameters: 1024
    Internally allocated memory (no private memories): 1024
    Internally allocated memory: 1024
  Time to perform memory allocation: 0.00 seconds


  Module allocation information for function __float_adde8m23b_127nih:
    Number of complex operations: 0
    Number of complex operations: 0
  Time to perform module allocation: 0.05 seconds


  Module allocation information for function __float_mule8m23b_127nih:
    Number of complex operations: 1
    Number of complex operations: 1
  Time to perform module allocation: 0.02 seconds


  Scheduling Information of function __float_adde8m23b_127nih:
    Number of control steps: 9
    Minimum slack: 0.010964990999999147
    Estimated max frequency (MHz): 200.43956360218834
  Time to perform scheduling: 0.03 seconds

  Number of function call sites = 19

  State Transition Graph Information of function __float_adde8m23b_127nih:
    Number of operations: 257
    Number of basic blocks: 3
    Number of states: 8
    Minimum number of cycles: 8
    Maximum number of cycles 8
    Parameters are registered
    Done port is registered
  Time to perform creation of STG: 0.01 seconds


  Scheduling Information of function __float_mule8m23b_127nih:
    Number of control steps: 8
    Minimum slack: 0.056999993999999221
    Estimated max frequency (MHz): 202.30629148010561
  Time to perform scheduling: 0.01 seconds

  Number of function call sites = 19

  State Transition Graph Information of function __float_mule8m23b_127nih:
    Number of operations: 104
    Number of basic blocks: 3
    Number of states: 7
    Minimum number of cycles: 7
    Maximum number of cycles 7
    Parameters are registered
    Done port is registered
  Time to perform creation of STG: 0.01 seconds


  Easy binding information for function __float_adde8m23b_127nih:
    Bound operations:192/257
  Time to perform easy binding: 0.00 seconds


  Easy binding information for function __float_mule8m23b_127nih:
    Bound operations:85/104
  Time to perform easy binding: 0.01 seconds


  Storage Value Information of function __float_adde8m23b_127nih:
    Number of storage values inserted: 89
  Time to compute storage value information: 0.00 seconds


  Storage Value Information of function __float_mule8m23b_127nih:
    Number of storage values inserted: 16
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.00 seconds
  Weight computation completed in 0.00 seconds
  False-loop computation completed in 0.00 seconds
  Iteration 0 completed in 0.00 seconds

  Register binding information for function __float_adde8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
  Time to perform register binding: 0.01 seconds

  Iteration 1 completed in 0.00 seconds
  Clique covering computation completed in 0.01 seconds

  Module binding information for function __float_adde8m23b_127nih:
    Number of modules instantiated: 257
    Number of performance conflicts: 13
    Estimated resources area (no Muxes and address logic): 2746
    Estimated area of MUX21: 0
    Total estimated area: 2746
    Estimated number of DSPs: 0
  Time to perform module binding: 0.01 seconds


  Register binding information for function __float_adde8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
  Time to perform register binding: 0.00 seconds

  Total number of flip-flops in function __float_adde8m23b_127nih: 488
  Slack computed in 0.00 seconds
  Weight computation completed in 0.00 seconds
  False-loop computation completed in 0.00 seconds
  Iteration 0 completed in 0.00 seconds

  Register binding information for function __float_mule8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
  Time to perform register binding: 0.00 seconds

  Iteration 1 completed in 0.00 seconds
  Clique covering computation completed in 0.00 seconds

  Module binding information for function __float_mule8m23b_127nih:
    Number of modules instantiated: 104
    Number of performance conflicts: 0
    Estimated resources area (no Muxes and address logic): 1100
    Estimated area of MUX21: 0
    Total estimated area: 1100
    Estimated number of DSPs: 3
  Time to perform module binding: 0.00 seconds


  Register binding information for function __float_mule8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
  Time to perform register binding: 0.00 seconds

  Total number of flip-flops in function __float_mule8m23b_127nih: 197

  Module allocation information for function forward_kernel:
    Number of complex operations: 99
    Number of complex operations: 99
  Time to perform module allocation: 0.04 seconds


  Scheduling Information of function forward_kernel:
    Number of control steps: 353
    Minimum slack: 0.010964988999952796
    Estimated max frequency (MHz): 200.43956352183443
  Time to perform scheduling: 0.05 seconds


  State Transition Graph Information of function forward_kernel:
    Number of operations: 428
    Number of basic blocks: 10
    Number of states: 353
    Done port is registered
  Time to perform creation of STG: 0.21 seconds


  Easy binding information for function forward_kernel:
    Bound operations:243/428
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function forward_kernel:
    Number of storage values inserted: 157
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.00 seconds
  Weight computation completed in 0.02 seconds
  False-loop computation completed in 0.00 seconds
  cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
  Iteration 0 completed in 0.01 seconds

  Register binding information for function forward_kernel:
    Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
  Time to perform register binding: 0.02 seconds

  cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
  Iteration 1 completed in 0.01 seconds
  Clique covering computation completed in 0.04 seconds

  Module binding information for function forward_kernel:
    Number of modules instantiated: 333
    Number of performance conflicts: 147
    Estimated resources area (no Muxes and address logic): 5699
    Estimated area of MUX21: 1332.3333333333333
    Total estimated area: 7031.333333333333
    Estimated number of DSPs: 0
  Time to perform module binding: 0.06 seconds


  Register binding information for function forward_kernel:
    Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
  Time to perform register binding: 0.02 seconds


  Connection Binding Information for function forward_kernel:
    Number of allocated multiplexers (2-to-1 equivalent): 141
    Total number of bit-level multiplexers: 4640
  Time to perform interconnection binding: 0.01 seconds

  Total number of flip-flops in function forward_kernel: 4767
  C-based testbench generation for function forward_kernel: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output//simulation/cosim.c
  Prepared testbench
error -> BOOL only supports single bit values: 2 - bambu_testbench_impl/master_P0/S_oe_ram (new_bit_size == 1)

Please report bugs to <panda-info@polimi.it>

@sei-jgwohlbier
Copy link
Author

Can you point me to some verilog that shows an interface coming from use of the --memory-mapped-top option?

@Ansaya
Copy link
Collaborator

Ansaya commented Aug 4, 2023

Hi, I had the chance to debug the issue, and I can tell that it is only related to the testbench generation.
I manually tested the interface, which works fine with the proper testbench configuration. Thus I will fix the issue and push the changes to the dev/panda branch.
Until then, if you want to run the simulation to verify the generated design, I suggest you remove the --memory-mapped-top parameter and the testbench generation should work correctly. On the other side, to generate the memory-mapped top design, you should remove the --generate-tb and --simulate options to avoid the testbench being generated.

@sei-jgwohlbier
Copy link
Author

Ok, thanks, I'll try this. I notice that when I try to compile the test benches I get errors of the following form. I expect it is because I am including the AppImage into the soda-opt container which is ubuntu 20.04, whereas I see some paths in the AppImage that indicate 18.04. So I think there is some include path inconsistencies.

However, I was able to use the --memory-mapped-top option to get a verilog file. I see the data paths are 64 bits wide. I tried using --data-bus-bitsize and --addr-bus-bitsize to set them to 32 but that did not work.

Thanks again!

clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang-16: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang-16: warning: argument unused during compilation: '-I /usr/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
In file included from /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output//simulation/cosim.c:20:
/usr/include/stdio.h:33:10: fatal error: 'stddef.h' file not found
#include <stddef.h>
         ^~~~~~~~~~
1 error generated.
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang-16: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang-16: warning: argument unused during compilation: '-I /usr/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
In file included from /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output//simulation/cosim.c:20:
/usr/include/stdio.h:33:10: fatal error: 'stddef.h' file not found
#include <stddef.h>
         ^~~~~~~~~~
1 error generated.
error -> Returned error code!

Please report bugs to <panda-info@polimi.it>

@sei-jgwohlbier
Copy link
Author

sei-jgwohlbier commented Aug 6, 2023

Using the dev/panda branch results in a different error for the torchscript.py reproducer. IR attached.

bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--top-fname=forward_kernel \
                 \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel ../../../output/05_llvm_baseline.ll 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision 7fb2f6c62adef935ec7eed79f3fd2365f5a0bdbe-dev/panda

Parameters parsed in 0.07 seconds

Target technology = FPGA
Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 10
  - combinational: 0
  - others: 10

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_FU
  Total cells    : 8
  - combinational: 0
  - others: 8

Library Name     : STD_FU
  Total cells    : 56
  - combinational: 0
  - others: 56

Library Name     : STD_FU
  Total cells    : 1
  - combinational: 0
  - others: 1

Library Name     : CS_COMPONENT
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 0
  - combinational: 0
  - others: 0

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 21
  - combinational: 0
  - others: 21

Library Name     : STD
  Total cells    : 14
  - combinational: 0
  - others: 14

Library Name     : STD_COMMON
  Total cells    : 57
  - combinational: 0
  - others: 57

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_PC
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_SOFT_FLOAT
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD
  Total cells    : 95
  - combinational: 0
  - others: 95

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 9
  - combinational: 0
  - others: 9

Library Name     : WBWrapper
  Total cells    : 12
  - combinational: 0
  - others: 12

Available devices:
 - 5CSEMA5F31C6
 - 5SGXEA7N2F45C1
 - EP2C70F896C6
 - EP2C70F896C6-R
 - EP4SGX530KH40C2
 - LFE335EA8FN484C
 - LFE5U85F8BG756C
 - LFE5UM85F8BG756C
 - asap7-BC
 - asap7-TC
 - asap7-WC
 - nangate45
 - nx1h140tsp
 - nx1h35S
 - nx2h540tsc
 - xc4vlx100-10ff1513
 - xc5vlx110t-1ff1136
 - xc5vlx330t-2ff1738
 - xc5vlx50-3ff1153
 - xc6vlx240t-1ff1156
 - xc7a100t-1csg324-VVD
 - xc7vx330t-1ffg1157
 - xc7vx485t-2ffg1761-VVD
 - xc7vx690t-3ffg1930-VVD
 - xc7z020-1clg484
 - xc7z020-1clg484-VVD
 - xc7z020-1clg484-YOSYS-VVD
 - xc7z045-2ffg900-VVD
 - xcku060-3ffva1156-VVD
 - xcu280-2Lfsvh2892-VVD
Library Name     : STD_FU
  Total cells    : 3931
  - combinational: 0
  - others: 3931

warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
 (in-process)  /usr/lib/llvm-16/lib/clang/16/include  /usr/local/include  /usr/include/x86_64-linux-gnu  /usr/include  warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.

error -> unexpected case (unsigned char)  __exp_bits_24767_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:314 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))

Please report bugs to <panda-info@polimi.it>

@fabrizioferrandi
Copy link
Collaborator

Using the dev/panda branch results in a different error for the torchscript.py reproducer. IR attached.

bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--top-fname=forward_kernel \
                 \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel ../../../output/05_llvm_baseline.ll 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision 7fb2f6c62adef935ec7eed79f3fd2365f5a0bdbe-dev/panda

Parameters parsed in 0.07 seconds

Target technology = FPGA
Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 10
  - combinational: 0
  - others: 10

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_FU
  Total cells    : 8
  - combinational: 0
  - others: 8

Library Name     : STD_FU
  Total cells    : 56
  - combinational: 0
  - others: 56

Library Name     : STD_FU
  Total cells    : 1
  - combinational: 0
  - others: 1

Library Name     : CS_COMPONENT
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 0
  - combinational: 0
  - others: 0

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 21
  - combinational: 0
  - others: 21

Library Name     : STD
  Total cells    : 14
  - combinational: 0
  - others: 14

Library Name     : STD_COMMON
  Total cells    : 57
  - combinational: 0
  - others: 57

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_PC
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_SOFT_FLOAT
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD
  Total cells    : 95
  - combinational: 0
  - others: 95

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 9
  - combinational: 0
  - others: 9

Library Name     : WBWrapper
  Total cells    : 12
  - combinational: 0
  - others: 12

Available devices:
 - 5CSEMA5F31C6
 - 5SGXEA7N2F45C1
 - EP2C70F896C6
 - EP2C70F896C6-R
 - EP4SGX530KH40C2
 - LFE335EA8FN484C
 - LFE5U85F8BG756C
 - LFE5UM85F8BG756C
 - asap7-BC
 - asap7-TC
 - asap7-WC
 - nangate45
 - nx1h140tsp
 - nx1h35S
 - nx2h540tsc
 - xc4vlx100-10ff1513
 - xc5vlx110t-1ff1136
 - xc5vlx330t-2ff1738
 - xc5vlx50-3ff1153
 - xc6vlx240t-1ff1156
 - xc7a100t-1csg324-VVD
 - xc7vx330t-1ffg1157
 - xc7vx485t-2ffg1761-VVD
 - xc7vx690t-3ffg1930-VVD
 - xc7z020-1clg484
 - xc7z020-1clg484-VVD
 - xc7z020-1clg484-YOSYS-VVD
 - xc7z045-2ffg900-VVD
 - xcku060-3ffva1156-VVD
 - xcu280-2Lfsvh2892-VVD
Library Name     : STD_FU
  Total cells    : 3931
  - combinational: 0
  - others: 3931

warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
 (in-process)  /usr/lib/llvm-16/lib/clang/16/include  /usr/local/include  /usr/include/x86_64-linux-gnu  /usr/include  warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.

error -> unexpected case (unsigned char)  __exp_bits_24767_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:314 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))

Please report bugs to <panda-info@polimi.it>

I tried on my local version with the latest dev/pand and I'm able to synthesize the code. Since the hash of the branch is the same, I would like to understand which version of clang you are using. I'm using the binaries downloaded from the github release page of Clang: https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.0/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz

@fabrizioferrandi
Copy link
Collaborator

fabrizioferrandi commented Aug 7, 2023

Ok, thanks, I'll try this. I notice that when I try to compile the test benches I get errors of the following form. I expect it is because I am including the AppImage into the soda-opt container which is ubuntu 20.04, whereas I see some paths in the AppImage that indicate 18.04. So I think there is some include path inconsistencies.
The references to 18.04 are actually to the clang binaries we include in the binary distribution.

However, I was able to use the --memory-mapped-top option to get a verilog file. I see the data paths are 64 bits wide. I tried using --data-bus-bitsize and --addr-bus-bitsize to set them to 32 but that did not work.

Bambu can manage IR generated by clang with different intel target architectures. The default one is the 32-bit architecture where -m32 is passed at clang. We added support also to -mx32 and -m64 but in the last case, the address space is 64 bits. So, in order to control the size of the address bus we added a bambu option but control the number of bits used by the minimal interface bus.
In your case, the -m32 is used and the address as well as the data bus size is 32.
The following signals are declared as 64 bits:
input [63:0] M_Rdata_ram;
input [63:0] S_addr_ram;
input [63:0] S_Wdata_ram;
output [63:0] Mout_addr_ram;
output [63:0] Mout_Wdata_ram;
output [63:0] Sout_Rdata_ram;

since you asked for a multi-channel minimal interface. The minimal interface may have one or two channels. Two channels means that you may have two on-flight memory transactions.
Calling Bambu with:

bambu -v3 --print-dot -lm \
--soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel 05_llvm_baseline.ll \
--memory-mapped-top

you will ask for a single channel minimal interface bus and so you will have this top-level interface:
input clock;
input reset;
input [31:0] M_Rdata_ram;
input M_DataRdy;
input S_oe_ram;
input S_we_ram;
input [31:0] S_addr_ram;
input [31:0] S_Wdata_ram;
input [5:0] S_data_ram_size;
// OUT
output done_port;
output Mout_oe_ram;
output Mout_we_ram;
output [31:0] Mout_addr_ram;
output [31:0] Mout_Wdata_ram;
output [5:0] Mout_data_ram_size;
output [31:0] Sout_Rdata_ram;
output Sout_DataRdy;

that should be what you actually expected. Isn't it?

@sei-jgwohlbier
Copy link
Author

I tried on my local version with the latest dev/pand and I'm able to synthesize the code. Since the hash of the branch is the same, I would like to understand which version of clang you are using. I'm using the binaries downloaded from the github release page of Clang: https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.0/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz

I am using the clang 16 release on Ubuntu 20.04. Here are some steps I use to install it in my docker image.

RUN echo "deb http://apt.llvm.org/focal/ llvm-toolchain-focal-16 main" >> /etc/apt/sources.list && \
    echo "deb-src http://apt.llvm.org/focal/ llvm-toolchain-focal-16 main" >> /etc/apt/sources.list && \
    apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 15CF4D18AF4F7421 && \
    apt update && \
    apt-get install -y \
    clang-16 \
    libclang-16-dev

RUN update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-16 100 && \
    update-alternatives --install /usr/bin/clang clang /usr/bin/clang-16 100 && \
    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 100 && \
    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 100

ENV CC=/usr/bin/clang
ENV CXX=/usr/bin/clang++
RUN rm -rf /opt/panda && \
    git clone https://github.com/ferrandi/PandA-bambu.git && \
    cd PandA-bambu && \
    git checkout dev/panda && \
    make -f Makefile.init && \
    mkdir obj && \
    cd obj && \
    ../configure --enable-flopoco --enable-opt --prefix=/opt/panda --enable-release && \
    make -j4 && \
    make install

@fabrizioferrandi
Copy link
Collaborator

Configuring with --enable-release implies passing to the compiler -DNDEBUG. This hides some errors leaving the error catch only to the THROW_ASSERTS. One way to improve the tracking of the bug could be to configure with --disable-release instead of --enable-release and see where the issues pop out.
A simpler way to track the error is to share the file panda-temp/05_llvm_baseline.ll.gimplePSSA and see if this file allows me to understand where the issue is. You have to call Bambu passing the --no-clean option.

@sei-jgwohlbier
Copy link
Author

sei-jgwohlbier commented Aug 7, 2023

I rebuilt with --disable-release. Output is below, including soda-opt processing. LLVM IR and gimple file attached.

python torchscript.py output/01_tosa.mlir --dialect=tosa
ToyCNN(
  (conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml  ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--top-fname=forward_kernel \
                 --no-clean \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --no-clean ../../../output/05_llvm_baseline.ll 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision 7fb2f6c62adef935ec7eed79f3fd2365f5a0bdbe-dev/panda

Parameters parsed in 0.11 seconds

Target technology = FPGA
Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 10
  - combinational: 0
  - others: 10

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_FU
  Total cells    : 8
  - combinational: 0
  - others: 8

Library Name     : STD_FU
  Total cells    : 56
  - combinational: 0
  - others: 56

Library Name     : STD_FU
  Total cells    : 1
  - combinational: 0
  - others: 1

Library Name     : CS_COMPONENT
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 0
  - combinational: 0
  - others: 0

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 21
  - combinational: 0
  - others: 21

Library Name     : STD
  Total cells    : 14
  - combinational: 0
  - others: 14

Library Name     : STD_COMMON
  Total cells    : 57
  - combinational: 0
  - others: 57

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_PC
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_SOFT_FLOAT
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD
  Total cells    : 95
  - combinational: 0
  - others: 95

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 9
  - combinational: 0
  - others: 9

Library Name     : WBWrapper
  Total cells    : 12
  - combinational: 0
  - others: 12

Available devices:
 - 5CSEMA5F31C6
 - 5SGXEA7N2F45C1
 - EP2C70F896C6
 - EP2C70F896C6-R
 - EP4SGX530KH40C2
 - LFE335EA8FN484C
 - LFE5U85F8BG756C
 - LFE5UM85F8BG756C
 - asap7-BC
 - asap7-TC
 - asap7-WC
 - nangate45
 - nx1h140tsp
 - nx1h35S
 - nx2h540tsc
 - xc4vlx100-10ff1513
 - xc5vlx110t-1ff1136
 - xc5vlx330t-2ff1738
 - xc5vlx50-3ff1153
 - xc6vlx240t-1ff1156
 - xc7a100t-1csg324-VVD
 - xc7vx330t-1ffg1157
 - xc7vx485t-2ffg1761-VVD
 - xc7vx690t-3ffg1930-VVD
 - xc7z020-1clg484
 - xc7z020-1clg484-VVD
 - xc7z020-1clg484-YOSYS-VVD
 - xc7z045-2ffg900-VVD
 - xcku060-3ffva1156-VVD
 - xcu280-2Lfsvh2892-VVD
Library Name     : STD_FU
  Total cells    : 3931
  - combinational: 0
  - others: 3931

warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.02 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.03 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Tree merging time: 0.12 seconds;
 (in-process)  /usr/lib/llvm-16/lib/clang/16/include  /usr/local/include  /usr/include/x86_64-linux-gnu  /usr/include  
error -> unexpected case (unsigned char)  __exp_bits_24767_[2] 8unsigned char old-bw=4 new-bw=4015 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:314 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))
	void Bit_Value_opt::propagateValue(const ssa_name *, tree_managerRef, tree_nodeRef, tree_nodeRef, const std::string)
	../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:270
Please report bugs to <panda-info@polimi.it>

@fabrizioferrandi
Copy link
Collaborator

The error you see seems to be due to a non-detected buffer overflow during the minimum bit computation. I cannot reproduce the issue, so please let me know if #206 fixed the problem.

@sei-jgwohlbier
Copy link
Author

That works using the --memory-mapped-top option! Thanks!

bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--top-fname=forward_kernel \
                 --memory-mapped-top --no-clean \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log

It fails when using --generate-tb --simulate --simulator=VERILATOR options, as below. I will work on creating a reproducer, but it will take a few days for me to get permission from my organization to get the code and the Dockerfile published.

python torchscript.py output/01_tosa.mlir --dialect=tosa
ToyCNN(
  (conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml  ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--top-fname=forward_kernel \
                 --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision f6bcd3bdaf988ef69272e21724bd338199baefc8-fix/minorIssues

Parameters parsed in 0.07 seconds

Target technology = FPGA
Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 10
  - combinational: 0
  - others: 10

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_FU
  Total cells    : 8
  - combinational: 0
  - others: 8

Library Name     : STD_FU
  Total cells    : 56
  - combinational: 0
  - others: 56

Library Name     : STD_FU
  Total cells    : 1
  - combinational: 0
  - others: 1

Library Name     : CS_COMPONENT
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 0
  - combinational: 0
  - others: 0

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 21
  - combinational: 0
  - others: 21

Library Name     : STD
  Total cells    : 14
  - combinational: 0
  - others: 14

Library Name     : STD_COMMON
  Total cells    : 57
  - combinational: 0
  - others: 57

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_PC
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_SOFT_FLOAT
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD
  Total cells    : 95
  - combinational: 0
  - others: 95

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 9
  - combinational: 0
  - others: 9

Library Name     : WBWrapper
  Total cells    : 12
  - combinational: 0
  - others: 12

Available devices:
 - 5CSEMA5F31C6
 - 5SGXEA7N2F45C1
 - EP2C70F896C6
 - EP2C70F896C6-R
 - EP4SGX530KH40C2
 - LFE335EA8FN484C
 - LFE5U85F8BG756C
 - LFE5UM85F8BG756C
 - asap7-BC
 - asap7-TC
 - asap7-WC
 - nangate45
 - nx1h140tsp
 - nx1h35S
 - nx2h540tsc
 - xc4vlx100-10ff1513
 - xc5vlx110t-1ff1136
 - xc5vlx330t-2ff1738
 - xc5vlx50-3ff1153
 - xc6vlx240t-1ff1156
 - xc7a100t-1csg324-VVD
 - xc7vx330t-1ffg1157
 - xc7vx485t-2ffg1761-VVD
 - xc7vx690t-3ffg1930-VVD
 - xc7z020-1clg484
 - xc7z020-1clg484-VVD
 - xc7z020-1clg484-YOSYS-VVD
 - xc7z045-2ffg900-VVD
 - xcku060-3ffva1156-VVD
 - xcu280-2Lfsvh2892-VVD
Library Name     : STD_FU
  Total cells    : 3931
  - combinational: 0
  - others: 3931

warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.02 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.03 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Tree merging time: 0.11 seconds;
 (in-process)  /usr/lib/llvm-16/lib/clang/16/include  /usr/local/include  /usr/include/x86_64-linux-gnu  /usr/include  
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: plus_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: cond_expr optimized, nbits = 3
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: cond_expr optimized, nbits = 3
  Bit Value Opt: cond_expr optimized, nbits = 4
  Bit Value Opt: cond_expr optimized, nbits = 5
  Bit Value Opt: bit_and_expr optimized, nbits = 1
  Bit Value Opt: ne_expr optimized, nbits = 1
  Bit Value Opt: bit_xor_expr optimized, nbits = 2
  Bit Value Opt: bit_and_expr optimized, nbits = 2
  Bit Value Opt: ne_expr optimized, nbits = 2
  Bit Value Opt: plus_expr optimized, nbits = 2
  Bit Value Opt: bit_and_expr optimized, nbits = 11
  Bit Value Opt: eq_expr optimized, nbits = 11
  Bit Value Opt: bit_and_expr optimized, nbits = 19
  Bit Value Opt: eq_expr optimized, nbits = 19
  Bit Value Opt: bit_and_expr optimized, nbits = 23
  Bit Value Opt: eq_expr optimized, nbits = 23
  Bit Value Opt: bit_and_expr optimized, nbits = 25
  Bit Value Opt: eq_expr optimized, nbits = 25
  Bit Value Opt: bit_and_expr optimized, nbits = 26
  Bit Value Opt: eq_expr optimized, nbits = 26
  Bit Value Opt: bit_and_expr optimized, nbits = 26
  Bit Value Opt: ne_expr optimized, nbits = 26
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: bit_and_expr optimized, nbits = 1
  Bit Value Opt: ne_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: bit_and_expr optimized, nbits = 22
  Bit Value Opt: ne_expr optimized, nbits = 22
  Bit Value Opt: bit_and_expr optimized, nbits = 47
  Bit Value Opt: ne_expr optimized, nbits = 47
  Bit Value Opt: bit_and_expr optimized, nbits = 9
  Bit Value Opt: ne_expr optimized, nbits = 9
  Bit Value Opt: bit_and_expr optimized, nbits = 32
  Bit Value Opt: ne_expr optimized, nbits = 32
  Bit Value Opt: plus_expr optimized, nbits = 4
  Bit Value Opt: plus_expr optimized, nbits = 4
  Bit Value Opt: plus_expr optimized, nbits = 4

  Functions to be synthesized:
    forward_kernel
    __float_mule8m23b_127nih
    __float_adde8m23b_127nih


  Memory allocation information:
  Sparse memory alignemnt set to 1024 bytes
Warning: This function uses unknown addresses: forward_kernel
    BRAM bitsize: 16
    Spec may not exploit DATA bus width
    Spec accesses data having an address unknown at compile time
    Internal data is not externally accessible
    DATA bus bitsize: 32
    ADDRESS bus bitsize: 32
    SIZE bus bitsize: 6
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Module allocation information for function __float_adde8m23b_127nih:
    Number of complex operations: 0
    Number of complex operations: 0
  Time to perform module allocation: 0.05 seconds


  Module allocation information for function __float_mule8m23b_127nih:
    Number of complex operations: 1
    Number of complex operations: 1
  Time to perform module allocation: 0.02 seconds


  Scheduling Information of function __float_adde8m23b_127nih:
    Number of control steps: 9
    Minimum slack: 0.010964990999998037
    Estimated max frequency (MHz): 200.43956360218831
  Time to perform scheduling: 0.03 seconds

  Number of function call sites = 19

  State Transition Graph Information of function __float_adde8m23b_127nih:
    Number of operations: 257
    Number of basic blocks: 3
    Number of states: 8
    Minimum number of cycles: 8
    Maximum number of cycles 8
    Parameters are registered
    Done port is registered
  Time to perform creation of STG: 0.02 seconds


  Scheduling Information of function __float_mule8m23b_127nih:
    Number of control steps: 8
    Minimum slack: 0.056999993999998111
    Estimated max frequency (MHz): 202.30629148010559
  Time to perform scheduling: 0.01 seconds

  Number of function call sites = 19

  State Transition Graph Information of function __float_mule8m23b_127nih:
    Number of operations: 104
    Number of basic blocks: 3
    Number of states: 7
    Minimum number of cycles: 7
    Maximum number of cycles 7
    Parameters are registered
    Done port is registered
  Time to perform creation of STG: 0.01 seconds


  Easy binding information for function __float_adde8m23b_127nih:
    Bound operations:192/257
  Time to perform easy binding: 0.00 seconds


  Easy binding information for function __float_mule8m23b_127nih:
    Bound operations:85/104
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function __float_adde8m23b_127nih:
    Number of storage values inserted: 89
  Time to compute storage value information: 0.00 seconds


  Storage Value Information of function __float_mule8m23b_127nih:
    Number of storage values inserted: 16
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.00 seconds
  Weight computation completed in 0.00 seconds
  False-loop computation completed in 0.00 seconds
  Iteration 0 completed in 0.00 seconds

  Register binding information for function __float_adde8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
  Time to perform register binding: 0.00 seconds

  Iteration 1 completed in 0.00 seconds
  Clique covering computation completed in 0.00 seconds

  Module binding information for function __float_adde8m23b_127nih:
    Number of modules instantiated: 257
    Number of performance conflicts: 13
    Estimated resources area (no Muxes and address logic): 2746
    Estimated area of MUX21: 0
    Total estimated area: 2746
    Estimated number of DSPs: 0
  Time to perform module binding: 0.01 seconds


  Register binding information for function __float_adde8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
  Time to perform register binding: 0.01 seconds

  Total number of flip-flops in function __float_adde8m23b_127nih: 488
  Slack computed in 0.00 seconds
  Weight computation completed in 0.00 seconds
  False-loop computation completed in 0.00 seconds
  Iteration 0 completed in 0.00 seconds

  Register binding information for function __float_mule8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
  Time to perform register binding: 0.00 seconds

  Iteration 1 completed in 0.00 seconds
  Clique covering computation completed in 0.00 seconds

  Module binding information for function __float_mule8m23b_127nih:
    Number of modules instantiated: 104
    Number of performance conflicts: 0
    Estimated resources area (no Muxes and address logic): 1100
    Estimated area of MUX21: 0
    Total estimated area: 1100
    Estimated number of DSPs: 3
  Time to perform module binding: 0.00 seconds


  Register binding information for function __float_mule8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
  Time to perform register binding: 0.00 seconds

  Total number of flip-flops in function __float_mule8m23b_127nih: 197

  Module allocation information for function forward_kernel:
    Number of complex operations: 99
    Number of complex operations: 99
  Time to perform module allocation: 0.03 seconds


  Scheduling Information of function forward_kernel:
    Number of control steps: 353
    Minimum slack: 0.010964988999987213
    Estimated max frequency (MHz): 200.43956352183582
  Time to perform scheduling: 0.06 seconds

  Number of function call sites = 0

  State Transition Graph Information of function forward_kernel:
    Number of operations: 428
    Number of basic blocks: 10
    Number of states: 353
    Done port is registered
  Time to perform creation of STG: 0.20 seconds


  Easy binding information for function forward_kernel:
    Bound operations:243/428
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function forward_kernel:
    Number of storage values inserted: 157
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.00 seconds
  Weight computation completed in 0.02 seconds
  False-loop computation completed in 0.00 seconds
  cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
  Iteration 0 completed in 0.01 seconds

  Register binding information for function forward_kernel:
    Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
  Time to perform register binding: 0.02 seconds

  cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
  Iteration 1 completed in 0.02 seconds
  Clique covering computation completed in 0.05 seconds

  Module binding information for function forward_kernel:
    Number of modules instantiated: 333
    Number of performance conflicts: 147
    Estimated resources area (no Muxes and address logic): 5699
    Estimated area of MUX21: 1332.3333333333333
    Total estimated area: 7031.333333333333
    Estimated number of DSPs: 0
  Time to perform module binding: 0.07 seconds


  Register binding information for function forward_kernel:
    Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
  Time to perform register binding: 0.02 seconds


  Connection Binding Information for function forward_kernel:
    Number of allocated multiplexers (2-to-1 equivalent): 141
    Total number of bit-level multiplexers: 4640
  Time to perform interconnection binding: 0.01 seconds

  Total number of flip-flops in function forward_kernel: 4767
  C-based testbench generation for function forward_kernel: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/cosim.c
  Prepared testbench

  Summary of resources:
     - ASSIGN_UNSIGNED_FU: 1
     - BMEMORY_CTRLN: 1
     - IUdata_converter_FU: 3
     - MUX_GATE: 141
     - OR_GATE: 2
     - UIdata_converter_FU: 3
     - UUdata_converter_FU: 263
     - constant_value: 139
     - flipflop_AR: 2
     - lshift_expr_FU: 3
     - lut_expr_FU: 71
     - multi_read_cond_FU: 1
     - read_cond_FU: 2
     - register_SE: 161
     - register_STD: 98
     - rshift_expr_FU: 3
     - ui_bit_and_expr_FU: 34
     - ui_bit_ior_concat_expr_FU: 4
     - ui_bit_ior_expr_FU: 39
     - ui_bit_xor_expr_FU: 2
     - ui_cond_expr_FU: 12
     - ui_eq_expr_FU: 3
     - ui_extract_bit_expr_FU: 101
     - ui_lshift_expr_FU: 65
     - ui_lt_expr_FU: 5
     - ui_minus_expr_FU: 1
     - ui_mult_expr_FU: 1
     - ui_ne_expr_FU: 6
     - ui_plus_expr_FU: 12
     - ui_pointer_plus_expr_FU: 41
     - ui_rshift_expr_FU: 26
     - ui_ternary_plus_expr_FU: 1
     - ui_ternary_pm_expr_FU: 1
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
make[1]: Entering directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o bambu_testbench.o /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o verilated.o /usr/share/verilator/include/verilated.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o verilated_dpi.o /usr/share/verilator/include/verilated_dpi.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench.o Vbambu_testbench.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench___024unit.o Vbambu_testbench___024unit.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench__Dpi.o Vbambu_testbench__Dpi.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench__Syms.o Vbambu_testbench__Syms.cpp
ar -cr Vbambu_testbench__ALL.a Vbambu_testbench.o Vbambu_testbench___024unit.o Vbambu_testbench__Dpi.o Vbambu_testbench__Syms.o
ranlib Vbambu_testbench__ALL.a
g++    bambu_testbench.o verilated.o verilated_dpi.o Vbambu_testbench__ALL.a   -m32 -lpthread /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//verilator_beh/libtb.so  -o Vbambu_testbench -lm -lstdc++ 
make[1]: Leaving directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
Results file: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt
Reset active: LOW
Co-sim: Co-simulation started
Co-sim: Memory size for parameter 0 set to 64 bytes.
Co-sim: Memory size for parameter 1 set to 64 bytes.
Co-sim: Memory size for parameter 2 set to 16 bytes.
Co-sim: Address 0xF72152E0 mapped at 0x40000000 (64 bytes)
Co-sim: Address 0xF72152A0 mapped at 0x40000040 (64 bytes)
Co-sim: Address 0xF7215290 mapped at 0x40000080 (16 bytes)
Co-sim: Pointer parameter 0xF72152E0 mapped at 0x40000000
Co-sim: Parameter 0 is 32 bits at 0xF7215258
Co-sim: Pointer parameter 0xF72152A0 mapped at 0x40000040
Co-sim: Parameter 1 is 32 bits at 0xF7215254
Co-sim: Pointer parameter 0xF7215290 mapped at 0x40000080
Co-sim: Parameter 2 is 32 bits at 0xF7215250
ERROR: Sim: Nearest memory space is 0x40000080->0xF7215290 to 0x40000090->0xF72152A0 (16 bytes).
ERROR: Sim: Read to non-mapped address 0x40000090.
File "/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt" opened
error -> Unable to parse simulation time report: check simulator output for errors.
	void SimulationTool::DetermineCycles(unsigned long long &, unsigned long long &)
	../../src/wrapper/simulation/SimulationTool.cpp:223
Please report bugs to <panda-info@polimi.it>

@fabrizioferrandi
Copy link
Collaborator

I was able to run the simulation but to be sure, I need forward_kernel_test.xml. From what I understood, the top function signature is different from the one used by the original tutorial.

@sei-jgwohlbier
Copy link
Author

sei-jgwohlbier commented Aug 10, 2023

Here is a reproducer. I get different errors when running on linux vs Intel Mac. Please let me know if anything is broken in the reproducer. Thanks!

git clone --recursive git@github.com:cmu-sei/soda-opt-docker.git
cd soda-opt-docker
docker build --rm --pull -f ./Dockerfile -t soda-opt:dev-panda .
docker run --rm -it --network=host --privileged -e DISPLAY=$DISPLAY -e UID=$(id -u) -e GID=$(id -g) -v `pwd`/env:/home/soda-opt-user/env:rw -v `pwd`/work:/home/soda-opt-user/work soda-opt:dev-panda
# in the container
cd work/pytorch-iris/
./getmakefile.sh
make synth-baseline

@sei-jgwohlbier
Copy link
Author

Hi, did you get a chance to try reproduce my errors? Thanks!

@fabrizioferrandi
Copy link
Collaborator

I’m working on it. It just takes longer than expected the setup.

@fabrizioferrandi
Copy link
Collaborator

Hi,
I've recently changed how some files are generated to manage opaque pointers.
One big change concerns the file describing the top function signature. This file is needed when the starting point is a .ll file. Opaque pointers make all pointers equivalent to void *. So, the size of the objects to which the pointers point has to be manually specified. This could be done by passing to Bambu the option '--interface-xml-filename=<filename>'
This file is automatically generated by soda-opt, and so the following line
bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll

need to be changed in

bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll --interface-xml-filename=../../forward_kernel_interface.xml

The latest option fixes the issue, but you can get more from this example.

Instead of the minimal interface, you may use the option --generate-interface=INFER that follows the same assumption adopted by Vitis HLS (see these pragmas). In this last case, Bambu can infer the interface to connect the three parameters to three different BRAMs. Since the bus is no longer a constraint, you are going to half the number of cycles. Since the array protocol requires to know exactly how large is the BRAM attached and since from .ll files it is impossible to specify the size of the array and the size of the base elements (at least with opaque pointers), I've recently extended the forward_kernel_interface.xml file by adding a new attribute to the parameters.
The newer version for your example is:

<?xml version="1.0"?>
<module>
 <function id="forward_kernel">
  <arg id="P0" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
  <arg id="P1" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
  <arg id="P2" SizeInBytes="64" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="16" interface_typename_include=""/>
 </function>

SizeInBytes allows Bambu to understand the memory layout of the function parameters. This new parameter will soon be added by @agostini01 to the soda-opt infrastructure.
So, if you now run
bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll --interface-xml-filename=../../forward_kernel_interface.xml --generate-interface=INFER
You should obtain a core taking 2266 cycles to complete.
Concerning the --memory-mapped-top option, it needs to be fixed. We are working on it, but fixing it may take some time.

@Ansaya
Copy link
Collaborator

Ansaya commented Sep 7, 2023

Hi,
I just fixed the issue preventing testbench generation for memory-mapped kernels with the latest dev/panda branch commits. Now, you should be able to use the --memory-mapped-top, --generate-tb, and --simulate options to let Bambu generate a proper testbench environment and run the simulation.

@sei-jgwohlbier
Copy link
Author

Great, I'll give it a try!

@sei-jgwohlbier
Copy link
Author

I still get an error with the reproducer, which is not using --memory-mapped-top.

bambu \
	-v3 --print-dot \
	-lm --soft-float \
	--compiler=I386_CLANG16 \
	--device=xc7z020-1clg484-VVD \
	--clock-period=5 \
	--experimental-setup=BAMBU-BALANCED-MP \
	--channels-number=2 \
	--memory-allocation-policy=ALL_BRAM \
	--disable-function-proxy \
	--top-fname=forward_kernel \
                 --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean \
	../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
 ==  Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision d0cb0caebaf1cc24e6fc6eb235156bc55fe21318-dev/panda

Parameters parsed in 0.10 seconds

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 10
  - combinational: 0
  - others: 10

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_FU
  Total cells    : 8
  - combinational: 0
  - others: 8

Library Name     : STD_FU
  Total cells    : 56
  - combinational: 0
  - others: 56

Library Name     : STD_FU
  Total cells    : 1
  - combinational: 0
  - others: 1

Library Name     : CS_COMPONENT
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 0
  - combinational: 0
  - others: 0

Library Name     : STD_FU
  Total cells    : 3
  - combinational: 0
  - others: 3

Library Name     : STD_FU
  Total cells    : 21
  - combinational: 0
  - others: 21

Library Name     : STD
  Total cells    : 14
  - combinational: 0
  - others: 14

Library Name     : STD_COMMON
  Total cells    : 57
  - combinational: 0
  - others: 57

Library Name     : STD_FU
  Total cells    : 33
  - combinational: 0
  - others: 33

Library Name     : STD_PC
  Total cells    : 16
  - combinational: 0
  - others: 16

Library Name     : STD_SOFT_FLOAT
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD
  Total cells    : 95
  - combinational: 0
  - others: 95

Library Name     : STD_FU
  Total cells    : 2
  - combinational: 0
  - others: 2

Library Name     : STD_FU
  Total cells    : 9
  - combinational: 0
  - others: 9

Library Name     : WBWrapper
  Total cells    : 12
  - combinational: 0
  - others: 12

Available devices:
 - 5CSEMA5F31C6
 - 5SGXEA7N2F45C1
 - EP2C70F896C6
 - EP2C70F896C6-R
 - EP4SGX530KH40C2
 - LFE335EA8FN484C
 - LFE5U85F8BG756C
 - LFE5UM85F8BG756C
 - asap7-BC
 - asap7-TC
 - asap7-WC
 - nangate45
 - nx1h140tsp
 - nx1h35S
 - nx2h540tsc
 - xc4vlx100-10ff1513
 - xc5vlx110t-1ff1136
 - xc5vlx330t-2ff1738
 - xc5vlx50-3ff1153
 - xc6vlx240t-1ff1156
 - xc7a100t-1csg324-VVD
 - xc7vx330t-1ffg1157
 - xc7vx485t-2ffg1761-VVD
 - xc7vx690t-3ffg1930-VVD
 - xc7z020-1clg484
 - xc7z020-1clg484-VVD
 - xc7z020-1clg484-YOSYS-VVD
 - xc7z045-2ffg900-VVD
 - xcku060-3ffva1156-VVD
 - xcu280-2Lfsvh2892-VVD
Library Name     : STD_FU
  Total cells    : 3931
  - combinational: 0
  - others: 3931

warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.02 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.03 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Tree merging time: 0.18 seconds;
 (in-process)  /usr/lib/llvm-16/lib/clang/16/include  /usr/local/include  /usr/include/x86_64-linux-gnu  /usr/include  
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: plus_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: cond_expr optimized, nbits = 3
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 2
  Bit Value Opt: cond_expr optimized, nbits = 3
  Bit Value Opt: cond_expr optimized, nbits = 4
  Bit Value Opt: cond_expr optimized, nbits = 5
  Bit Value Opt: bit_and_expr optimized, nbits = 1
  Bit Value Opt: ne_expr optimized, nbits = 1
  Bit Value Opt: bit_xor_expr optimized, nbits = 2
  Bit Value Opt: bit_and_expr optimized, nbits = 2
  Bit Value Opt: ne_expr optimized, nbits = 2
  Bit Value Opt: plus_expr optimized, nbits = 2
  Bit Value Opt: bit_and_expr optimized, nbits = 11
  Bit Value Opt: eq_expr optimized, nbits = 11
  Bit Value Opt: bit_and_expr optimized, nbits = 19
  Bit Value Opt: eq_expr optimized, nbits = 19
  Bit Value Opt: bit_and_expr optimized, nbits = 23
  Bit Value Opt: eq_expr optimized, nbits = 23
  Bit Value Opt: bit_and_expr optimized, nbits = 25
  Bit Value Opt: eq_expr optimized, nbits = 25
  Bit Value Opt: bit_and_expr optimized, nbits = 26
  Bit Value Opt: eq_expr optimized, nbits = 26
  Bit Value Opt: bit_and_expr optimized, nbits = 26
  Bit Value Opt: ne_expr optimized, nbits = 26
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: bit_and_expr optimized, nbits = 1
  Bit Value Opt: ne_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: cond_expr optimized, nbits = 1
  Bit Value Opt: bit_and_expr optimized, nbits = 22
  Bit Value Opt: ne_expr optimized, nbits = 22
  Bit Value Opt: bit_and_expr optimized, nbits = 47
  Bit Value Opt: ne_expr optimized, nbits = 47
  Bit Value Opt: bit_and_expr optimized, nbits = 9
  Bit Value Opt: ne_expr optimized, nbits = 9
  Bit Value Opt: bit_and_expr optimized, nbits = 32
  Bit Value Opt: ne_expr optimized, nbits = 32
  Bit Value Opt: plus_expr optimized, nbits = 4
  Bit Value Opt: plus_expr optimized, nbits = 4
  Bit Value Opt: plus_expr optimized, nbits = 4

  Functions to be synthesized:
    forward_kernel
    __float_mule8m23b_127nih
    __float_adde8m23b_127nih


  Memory allocation information:
  Sparse memory alignemnt set to 1024 bytes
Warning: This function uses unknown addresses: forward_kernel
    BRAM bitsize: 16
    Spec may not exploit DATA bus width
    Spec accesses data having an address unknown at compile time
    Internal data is not externally accessible
    DATA bus bitsize: 32
    ADDRESS bus bitsize: 32
    SIZE bus bitsize: 6
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Module allocation information for function __float_adde8m23b_127nih:
    Number of complex operations: 0
    Number of complex operations: 0
  Time to perform module allocation: 0.09 seconds


  Module allocation information for function __float_mule8m23b_127nih:
    Number of complex operations: 1
    Number of complex operations: 1
  Time to perform module allocation: 0.03 seconds


  Scheduling Information of function __float_adde8m23b_127nih:
    Number of control steps: 9
    Minimum slack: 0.010964990999998037
    Estimated max frequency (MHz): 200.43956360218831
  Time to perform scheduling: 0.05 seconds

  Number of function call sites = 19

  State Transition Graph Information of function __float_adde8m23b_127nih:
    Number of operations: 257
    Number of basic blocks: 3
    Number of states: 8
    Minimum number of cycles: 8
    Maximum number of cycles 8
    Parameters are registered
    Done port is registered
  Time to perform creation of STG: 0.03 seconds


  Scheduling Information of function __float_mule8m23b_127nih:
    Number of control steps: 8
    Minimum slack: 0.056999993999998111
    Estimated max frequency (MHz): 202.30629148010559
  Time to perform scheduling: 0.02 seconds

  Number of function call sites = 19

  State Transition Graph Information of function __float_mule8m23b_127nih:
    Number of operations: 104
    Number of basic blocks: 3
    Number of states: 7
    Minimum number of cycles: 7
    Maximum number of cycles 7
    Parameters are registered
    Done port is registered
  Time to perform creation of STG: 0.02 seconds


  Easy binding information for function __float_adde8m23b_127nih:
    Bound operations:192/257
  Time to perform easy binding: 0.00 seconds


  Easy binding information for function __float_mule8m23b_127nih:
    Bound operations:85/104
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function __float_adde8m23b_127nih:
    Number of storage values inserted: 89
  Time to compute storage value information: 0.00 seconds


  Storage Value Information of function __float_mule8m23b_127nih:
    Number of storage values inserted: 16
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.00 seconds
  Weight computation completed in 0.00 seconds
  False-loop computation completed in 0.00 seconds
  Iteration 0 completed in 0.00 seconds

  Register binding information for function __float_adde8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
  Time to perform register binding: 0.00 seconds

  Iteration 1 completed in 0.00 seconds
  Clique covering computation completed in 0.00 seconds

  Module binding information for function __float_adde8m23b_127nih:
    Number of modules instantiated: 257
    Number of performance conflicts: 13
    Estimated resources area (no Muxes and address logic): 2745
    Estimated area of MUX21: 0
    Total estimated area: 2745
    Estimated number of DSPs: 0
  Time to perform module binding: 0.01 seconds


  Register binding information for function __float_adde8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
  Time to perform register binding: 0.01 seconds

  Total number of flip-flops in function __float_adde8m23b_127nih: 488
  Slack computed in 0.00 seconds
  Weight computation completed in 0.00 seconds
  False-loop computation completed in 0.00 seconds
  Iteration 0 completed in 0.00 seconds

  Register binding information for function __float_mule8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
  Time to perform register binding: 0.00 seconds

  Iteration 1 completed in 0.00 seconds
  Clique covering computation completed in 0.00 seconds

  Module binding information for function __float_mule8m23b_127nih:
    Number of modules instantiated: 104
    Number of performance conflicts: 0
    Estimated resources area (no Muxes and address logic): 1100
    Estimated area of MUX21: 0
    Total estimated area: 1100
    Estimated number of DSPs: 3
  Time to perform module binding: 0.00 seconds


  Register binding information for function __float_mule8m23b_127nih:
    Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
  Time to perform register binding: 0.00 seconds

  Total number of flip-flops in function __float_mule8m23b_127nih: 197

  Module allocation information for function forward_kernel:
    Number of complex operations: 99
    Number of complex operations: 99
  Time to perform module allocation: 0.06 seconds


  Scheduling Information of function forward_kernel:
    Number of control steps: 353
    Minimum slack: 0.14839999500031809
    Estimated max frequency (MHz): 206.11756924921215
  Time to perform scheduling: 0.11 seconds

  Number of function call sites = 0

  State Transition Graph Information of function forward_kernel:
    Number of operations: 428
    Number of basic blocks: 10
    Number of states: 353
    Done port is registered
  Time to perform creation of STG: 0.45 seconds


  Easy binding information for function forward_kernel:
    Bound operations:243/428
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function forward_kernel:
    Number of storage values inserted: 156
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.00 seconds
  Weight computation completed in 0.03 seconds
  False-loop computation completed in 0.00 seconds
  cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
  Iteration 0 completed in 0.03 seconds

  Register binding information for function forward_kernel:
    Register allocation algorithm obtains a sub-optimal result: 149 registers(LB:41)
  Time to perform register binding: 0.04 seconds

  cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
  cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
  Iteration 1 completed in 0.03 seconds
  Clique covering computation completed in 0.10 seconds

  Module binding information for function forward_kernel:
    Number of modules instantiated: 333
    Number of performance conflicts: 147
    Estimated resources area (no Muxes and address logic): 5694
    Estimated area of MUX21: 1332.3333333333333
    Total estimated area: 7026.333333333333
    Estimated number of DSPs: 0
  Time to perform module binding: 0.14 seconds


  Register binding information for function forward_kernel:
    Register allocation algorithm obtains a sub-optimal result: 149 registers(LB:41)
  Time to perform register binding: 0.03 seconds


  Connection Binding Information for function forward_kernel:
    Number of allocated multiplexers (2-to-1 equivalent): 140
    Total number of bit-level multiplexers: 4608
  Time to perform interconnection binding: 0.01 seconds

  Total number of flip-flops in function forward_kernel: 4735
  C-based testbench generation for function forward_kernel: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/cosim.c
  Prepared testbench

  Summary of resources:
     - ASSIGN_UNSIGNED_FU: 1
     - BMEMORY_CTRLN: 1
     - IUdata_converter_FU: 3
     - MUX_GATE: 140
     - OR_GATE: 2
     - UIdata_converter_FU: 3
     - UUdata_converter_FU: 263
     - constant_value: 139
     - flipflop_AR: 2
     - lshift_expr_FU: 3
     - lut_expr_FU: 71
     - multi_read_cond_FU: 1
     - read_cond_FU: 2
     - register_SE: 160
     - register_STD: 98
     - rshift_expr_FU: 3
     - ui_bit_and_expr_FU: 34
     - ui_bit_ior_concat_expr_FU: 4
     - ui_bit_ior_expr_FU: 39
     - ui_bit_xor_expr_FU: 2
     - ui_cond_expr_FU: 12
     - ui_eq_expr_FU: 3
     - ui_extract_bit_expr_FU: 101
     - ui_lshift_expr_FU: 65
     - ui_lt_expr_FU: 5
     - ui_minus_expr_FU: 1
     - ui_mult_expr_FU: 1
     - ui_ne_expr_FU: 6
     - ui_plus_expr_FU: 12
     - ui_pointer_plus_expr_FU: 41
     - ui_rshift_expr_FU: 26
     - ui_ternary_plus_expr_FU: 1
     - ui_ternary_pm_expr_FU: 1
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
make[1]: Entering directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o bambu_testbench.o /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o verilated.o /usr/share/verilator/include/verilated.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o verilated_dpi.o /usr/share/verilator/include/verilated_dpi.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench.o Vbambu_testbench.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench___024unit.o Vbambu_testbench___024unit.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench__Dpi.o Vbambu_testbench__Dpi.cpp
g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow  -fstrict-aliasing   -m32   -c -o Vbambu_testbench__Syms.o Vbambu_testbench__Syms.cpp
ar -cr Vbambu_testbench__ALL.a Vbambu_testbench.o Vbambu_testbench___024unit.o Vbambu_testbench__Dpi.o Vbambu_testbench__Syms.o
ranlib Vbambu_testbench__ALL.a
g++    bambu_testbench.o verilated.o verilated_dpi.o Vbambu_testbench__ALL.a   -m32 -lpthread /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//verilator_beh/libtb.so  -o Vbambu_testbench -lm -lstdc++ 
make[1]: Leaving directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
Results file: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt
Reset active: LOW
Co-sim: Co-simulation started
Co-sim: Memory size for parameter 0 set to 64 bytes.
Co-sim: Memory size for parameter 1 set to 64 bytes.
Co-sim: Memory size for parameter 2 set to 16 bytes.
Co-sim: Address 0xF71882E0 mapped at 0x40000000 (64 bytes)
Co-sim: Address 0xF71882A0 mapped at 0x40000040 (64 bytes)
Co-sim: Address 0xF7188290 mapped at 0x40000080 (16 bytes)
Co-sim: Pointer parameter 0xF71882E0 mapped at 0x40000000
Co-sim: Parameter 0 is 32 bits at 0xF7188258
Co-sim: Pointer parameter 0xF71882A0 mapped at 0x40000040
Co-sim: Parameter 1 is 32 bits at 0xF7188254
Co-sim: Pointer parameter 0xF7188290 mapped at 0x40000080
Co-sim: Parameter 2 is 32 bits at 0xF7188250
ERROR: Sim: Nearest memory space is 0x40000080->0xF7188290 to 0x40000090->0xF71882A0 (16 bytes).
ERROR: Sim: Read to non-mapped address 0x40000090.
File "/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt" opened
error -> Unable to parse simulation time report: check simulator output for errors.
	void SimulationTool::DetermineCycles(unsigned long long &, unsigned long long &)
	../../src/wrapper/simulation/SimulationTool.cpp:222
Please report bugs to <panda-info@polimi.it>

@Ansaya
Copy link
Collaborator

Ansaya commented Sep 8, 2023

That error means the kernel is trying to access a memory area not allocated on the accelerator. The error lines are intended to be similar to Valgrind output if you are familiar with that. The first error line, ERROR: Sim: Nearest memory space is 0x40000080->0xF7188290 to 0x40000090->0xF71882A0 (16 bytes). tries to report information about the surrounding memory space, which seems to be related to the third parameter in this case. The second error line says information about the illegal memory operation ERROR: Sim: Read to non-mapped address 0x40000090. which seems to be a read right after the third parameter memory space ends.
This may be related to an error in the computation or, most likely, to a wrong testbench memory initialization. Just as a quick check, you may have a look at the beginning of the simulation log and verify each parameter size is as expected.

Co-sim: Memory size for parameter 0 set to 64 bytes.
Co-sim: Memory size for parameter 1 set to 64 bytes.
Co-sim: Memory size for parameter 2 set to 16 bytes.

Also, it may be useful to write a C/C++ testbench to check the kernel functionality before the synthesis. As a starting point, you may use the generated testbench, which you can find in HLS_output/simulation/cosim.c: you may copy just the main function implementation in a separate C file, compile that along with 05_llvm_baseline.ll, and check the executable is running fine (maybe with Valgrind too).
If you can share both 05_llvm_baseline.ll and forward_kernel_test.xml, I can help you with that.

@fabrizioferrandi
Copy link
Collaborator

You need to pass --interface-xml-filename=../../forward_kernel_interface.xml with forward_kernel_interface.xml having this content:

<?xml version="1.0"?>
<module>
 <function id="forward_kernel">
  <arg id="P0" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
  <arg id="P1" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
  <arg id="P2" SizeInBytes="64" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="16" interface_typename_include=""/>
 </function>

@sei-jgwohlbier
Copy link
Author

It works! Thanks!

@sei-jgwohlbier
Copy link
Author

I changed to another neural network and run into new issues with the --memory-mapped-top option. If you would like to reproduce you can just update the submodule from the reproducer steps above.

This is the error.

clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:70: Duplicate declaration of module: 'join_signal'
module join_signal(in1,
       ^~~~~~~~~~~
                 /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18397: ... Location of original declaration
module join_signal(in1,
       ^~~~~~~~~~~
                 ... Use "/* verilator lint_off MODDUP */" and lint_on around source to disable this message.
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:93: Duplicate declaration of module: 'split_signal'
module split_signal(in1,
       ^~~~~~~~~~~~
                 /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18420: ... Location of original declaration
module split_signal(in1,
       ^~~~~~~~~~~~
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:1080: Duplicate declaration of module: 'bus_merger'
module bus_merger(in1,
       ^~~~~~~~~~
                 /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18366: ... Location of original declaration
module bus_merger(in1,
       ^~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Expecting expression to be constant, but variable isn't const: 'MEM_var_394383_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
  datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
                                                   ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Expecting expression to be constant, but variable isn't const: 'MEM_var_394386_393256'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394386_393256(MEM_var_394386_393256),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Expecting expression to be constant, but variable isn't const: 'MEM_var_394391_393256'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394391_393256(MEM_var_394391_393256),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Expecting expression to be constant, but variable isn't const: 'MEM_var_395284_393256'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_395284_393256(MEM_var_395284_393256),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Expecting expression to be constant, but variable isn't const: 'MEM_var_439985_403892'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_439985_403892(MEM_var_439985_403892),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Expecting expression to be constant, but variable isn't const: 'MEM_var_440165_403892'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440165_403892(MEM_var_440165_403892),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Expecting expression to be constant, but variable isn't const: 'MEM_var_440251_403892'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440251_403892(MEM_var_440251_403892),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Expecting expression to be constant, but variable isn't const: 'MEM_var_495234_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_495234_495177(MEM_var_495234_495177),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Expecting expression to be constant, but variable isn't const: 'MEM_var_496077_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496077_495177(MEM_var_496077_495177),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Expecting expression to be constant, but variable isn't const: 'MEM_var_496299_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Can't convert defparam value to constant: Param 'MEM_var_394383_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
  datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
                             ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Can't convert defparam value to constant: Param 'MEM_var_394386_393256' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394386_393256(MEM_var_394386_393256),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Can't convert defparam value to constant: Param 'MEM_var_394391_393256' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394391_393256(MEM_var_394391_393256),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Can't convert defparam value to constant: Param 'MEM_var_395284_393256' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_395284_393256(MEM_var_395284_393256),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Can't convert defparam value to constant: Param 'MEM_var_439985_403892' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_439985_403892(MEM_var_439985_403892),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Can't convert defparam value to constant: Param 'MEM_var_440165_403892' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440165_403892(MEM_var_440165_403892),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Can't convert defparam value to constant: Param 'MEM_var_440251_403892' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440251_403892(MEM_var_440251_403892),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Can't convert defparam value to constant: Param 'MEM_var_495234_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_495234_495177(MEM_var_495234_495177),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Can't convert defparam value to constant: Param 'MEM_var_496077_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496077_495177(MEM_var_496077_495177),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Can't convert defparam value to constant: Param 'MEM_var_496299_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: Exiting due to 20 error(s)
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:70: Duplicate declaration of module: 'join_signal'
module join_signal(in1,
       ^~~~~~~~~~~
                 /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18397: ... Location of original declaration
module join_signal(in1,
       ^~~~~~~~~~~
                 ... Use "/* verilator lint_off MODDUP */" and lint_on around source to disable this message.
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:93: Duplicate declaration of module: 'split_signal'
module split_signal(in1,
       ^~~~~~~~~~~~
                 /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18420: ... Location of original declaration
module split_signal(in1,
       ^~~~~~~~~~~~
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:1080: Duplicate declaration of module: 'bus_merger'
module bus_merger(in1,
       ^~~~~~~~~~
                 /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18366: ... Location of original declaration
module bus_merger(in1,
       ^~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Expecting expression to be constant, but variable isn't const: 'MEM_var_394383_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
  datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
                                                   ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Expecting expression to be constant, but variable isn't const: 'MEM_var_394386_393256'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394386_393256(MEM_var_394386_393256),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Expecting expression to be constant, but variable isn't const: 'MEM_var_394391_393256'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394391_393256(MEM_var_394391_393256),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Expecting expression to be constant, but variable isn't const: 'MEM_var_395284_393256'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_395284_393256(MEM_var_395284_393256),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Expecting expression to be constant, but variable isn't const: 'MEM_var_439985_403892'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_439985_403892(MEM_var_439985_403892),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Expecting expression to be constant, but variable isn't const: 'MEM_var_440165_403892'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440165_403892(MEM_var_440165_403892),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Expecting expression to be constant, but variable isn't const: 'MEM_var_440251_403892'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440251_403892(MEM_var_440251_403892),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Expecting expression to be constant, but variable isn't const: 'MEM_var_495234_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_495234_495177(MEM_var_495234_495177),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Expecting expression to be constant, but variable isn't const: 'MEM_var_496077_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496077_495177(MEM_var_496077_495177),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Expecting expression to be constant, but variable isn't const: 'MEM_var_496299_495177'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
                           ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Can't convert defparam value to constant: Param 'MEM_var_394383_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
  datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
                             ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Can't convert defparam value to constant: Param 'MEM_var_394386_393256' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394386_393256(MEM_var_394386_393256),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Can't convert defparam value to constant: Param 'MEM_var_394391_393256' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_394391_393256(MEM_var_394391_393256),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Can't convert defparam value to constant: Param 'MEM_var_395284_393256' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_395284_393256(MEM_var_395284_393256),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Can't convert defparam value to constant: Param 'MEM_var_439985_403892' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_439985_403892(MEM_var_439985_403892),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Can't convert defparam value to constant: Param 'MEM_var_440165_403892' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440165_403892(MEM_var_440165_403892),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Can't convert defparam value to constant: Param 'MEM_var_440251_403892' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_440251_403892(MEM_var_440251_403892),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Can't convert defparam value to constant: Param 'MEM_var_495234_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_495234_495177(MEM_var_495234_495177),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Can't convert defparam value to constant: Param 'MEM_var_496077_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496077_495177(MEM_var_496077_495177),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Can't convert defparam value to constant: Param 'MEM_var_496299_495177' of 'Datapath_i'
                                                                                          : ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
    .MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
     ^~~~~~~~~~~~~~~~~~~~~
%Error: Exiting due to 20 error(s)
error -> Returned error code!
	int ToolManager::execute_command(const std::string &, const std::string &, const std::string &, bool, bool)
	../../src/wrapper/ToolManager.cpp:94
Please report bugs to <panda-info@polimi.it>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants