

## Pipeline 实验

### 1、实验目的

在 SMART 平台上通过调整分支预测配置情况观察测试程序的 CPI、分支预测准确率等，了解各种配置对 CPU 的性能的影响。

### 2、实验步骤（包括实验结果，数据记录，截图等）

(1) 更改以及替换 SMART 平台内对应的文件，包括 crt0.s, dhrystone 程序和 coremark 程序。

```
ecd11@admin02:~/Desktop
File Edit View Search Terminal Help
admin:/home/ECDesign/ecd11/smarty9_release/lib>[111]cd ..
admin:/home/ECDesign/ecd11/smarty9_release>[112]ls
./ case/ lib/ readme run_smart* tb/ tools/
../ debug_test/ mem_intf/ rtl/ setup.csh tmp/ workdir/
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/
ECDesign/ cypro/ lhpro/ rjypro/ wkpro/ xxypro/ zppro/
admin/ czpro/ ljmpro/ sqppro/ wllpro/ xyfpro/ zxnprom/
ccxpro/ fybpro/ lypro/ tjpro/ wmypro/ ychpro/ zxyprom/
cgspro/ gzqpro/ lyppro/ tzwpro/ xcpro/ yefpro/
cjlpro/ gzypro/ mslpro/ wclpro/ xhtpro/ yfpro/
clamav/ hjpro/ nrhpro/ wcpro/ xkpro/ yjpro/
cppro/ home_on_gpfs postgraduate/ wjypro/ xpcpro/ ynpro/
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab
lab7/ lab8/ lab9/
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab8/
Main.c core_main.c crt0.s
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab8/crt0.s lib/
Makefile clib/ core_init.h core_ls.s crt0.s linker.lcf
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab8/crt0.s lib/crt0.s

cp: overwrite 'lib/crt0.s'? y
admin:/home/ECDesign/ecd11/smarty9_release>[114]
```

```
ecd11@admin02:~/Desktop
File Edit View Search Terminal Help
ccxpro/ fybpro/ lypro/ tjpro/ wmypro/ ychpro/ zxyprom/
cgspro/ gzqpro/ lyppro/ tzwpro/ xcpro/ yefpro/
cjlpro/ gzypro/ mslpro/ wclpro/ xhtpro/ yfpro/
clamav/ hjpro/ nrhpro/ wcpro/ xkpro/ yjpro/
cppro/ home_on_gpfs postgraduate/ wjypro/ xpcpro/ ynpro/
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab
lab7/ lab8/ lab9/
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab8/
Main.c core_main.c crt0.s
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab8/crt0.s lib/
Makefile clib/ core_init.h core_ls.s crt0.s linker.lcf
admin:/home/ECDesign/ecd11/smarty9_release>[113]cp -f /home/ECDesign/ECDesign_share/lab8/crt0.s lib/crt0.s

cp: overwrite 'lib/crt0.s'? y
admin:/home/ECDesign/ecd11/smarty9_release>[114]cp -f /home/ECDesign/ECDesign_share/lab8/
Main.c core_main.c crt0.s
admin:/home/ECDesign/ecd11/smarty9_release>[114]cp -f /home/ECDesign/ECDesign_share/lab8/Main.c case/dhry/M
ain.c
Main.c Main.c~ Main.elf* Main.hex* Main.obj Main_data.hex* Main_inst.hex*
admin:/home/ECDesign/ecd11/smarty9_release>[114]cp -f /home/ECDesign/ECDesign_share/lab8/Main.c case/dhry/M
ain.c
cp: overwrite 'case/dhry/Main.c'? y
admin:/home/ECDesign/ecd11/smarty9_release>[115]
```

```
ecd11@admin02:~/Desktop
File Edit View Search Terminal Help
core_list_join.c core_main.hex* core_main_inst.hex* core_portme.h coremark.h
core_main.c core_main.obj core_matrix.c core_state.c
core_main.elf* core_main_data.hex* core_portme.c core_util.c
admin:/home/ECDesign/ecd11/smarty9_release>[115]cp -f /home/ECDesign/ECDesign_share/lab8/core_main.c case/c
oremark/core_main.c
cp: overwrite 'case/coremark/core_main.c'? y
admin:/home/ECDesign/ecd11/smarty9_release>[116]ll
total 22
drwxr-xr-x 11 ecd11 ECDesign 4096 Apr 16 20:47 .
drwxr-xr-x 22 ecd11 ECDesign 4096 Apr 26 22:56 ..
drwxr-xr-x 16 ecd11 ECDesign 4096 Apr 16 21:32 case/
drwxr-xr-x 2 ecd11 ECDesign 4096 Apr 13 15:42 debug_test/
drwxr-xr-x 3 ecd11 ECDesign 4096 Apr 26 22:56 lib/
drwxr-xr-x 2 ecd11 ECDesign 4096 Apr 13 15:42 mem_intf/
-rw-r--r-- 1 ecd11 ECDesign 700 Apr 16 16:26 readme
drwxr-xr-x 5 ecd11 ECDesign 4096 Apr 13 15:42 rtl/
-rwxr-xr-x 1 ecd11 ECDesign 5733 Apr 13 15:42 run_smart*
-rw-r--r-- 1 ecd11 ECDesign 2714 Apr 13 15:42 setup.csh
drwxr-xr-x 2 ecd11 ECDesign 4096 Apr 13 15:42 tb/
drwxr-xr-x 2 ecd11 ECDesign 4096 Apr 26 12:02 tmp/
drwxr-xr-x 3 ecd11 ECDesign 4096 Apr 13 15:42 tools/
drwxr-xr-x 4 ecd11 ECDesign 8192 Apr 26 20:45 workdir/
admin:/home/ECDesign/ecd11/smarty9_release>[117]
```

(2) 在启动文件 crt0.s 中选择分支预测的配置，并进行 dhystone 程序和 coremark 程序的仿真。

```
File Edit Tools Syntax Buffers Window Help
num_cycle is 1140992
num_instret is 2040028
num_conditional_branch_mis is 19
num_indirect_branch_mis is 0
num_indirect_branch_inst is 0

VCUNT_SIM: dhystone is 4.991228 dmips/MHz
Int_1_Loc:      5
    should be:  5
Int_2_Loc:      13
    should be: 13
Int_3_Loc:      7
    should be:  7
*****
* simulation finished successfully *
*****
$finish called from file "../tb/tb.v", line 315.
$finish at simulation time 121934550
V C S S i m u l a t i o n R e p o r t
Time: 12193455000 ps
CPU Time: 897.310 seconds; Data structure size: 1028.4Mb
Thu Apr 27 13:52:28 2023
CPU time: 23.426 seconds to compile + 1.881 seconds to elab + .297 seconds to link + 897.382 seconds in simulation
587,1           Bot
```

```
File Edit Tools Syntax Buffers Window Help
num_cycle is 5684994
num_instret is 9474454
num_conditional_branch_mis is 59995
num_indirect_branch_mis is 5
num_indirect_branch_inst is 320

VCUNT_SIM: CoreMark has been run 40 times, one times cost 142124 cycles !

VCUNT_SIM: CoreMark 1.0 : 7.036109 CoreMark/MHz
2K performance run parameters for coremark.
CoreMark Size      : 666
Total ticks       : -1
CoreMark/MHz      : 7.036109
Iterations        : 40
Compiler version  : GCC8.1.0
Compiler flags     : -O3
Memory location   : Please put data memory location here
                    (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist        : 0xe714
[0]crcmatrix      : 0x1fd7
[0]crcstate        : 0x8e3a
578,1           97%
```

```

File Edit Tools Syntax Buffers Window Help
Program compiled without 'register' attribute
Execution starts, 10000 runs through Dhystone

num_cycle is 3040575
num_instret is 2040028
num_conditional_branch_mis is 69999
num_indirect_branch_mis is 0
num_indirect_branch_inst is 0

VCNT_SIM: dhystone is 1.871711 dmips/MHz
Int_1_Loc:      5
    should be: 5
Int_2_Loc:      13
    should be: 13
Int_3_Loc:      7
    should be: 7
*****
* simulation finished successfully *
*****
$finish called from file "./tb/tb.v", line 315.
$finish at simulation time          322791850
V C S S i m u l a t i o n R e p o r t
Time: 32279185000 ps

```

581,1 99%

```

File Edit Tools Syntax Buffers Window Help
num_cycle is 12875906
num_instret is 9474454
num_conditional_branch_mis is 682829
num_indirect_branch_mis is 320
num_indirect_branch_inst is 320

VCNT_SIM: CoreMark has been run 40 times, one times cost 321897 cycles !

VCNT_SIM: CoreMark 1.0 : 3.106584 CoreMark/MHz
2K performance run parameters for coremark.
CoreMark Size      : 666
Total ticks       : -1
CoreMark/MHz      : 3.106584
Iterations        : 40
Compiler version  : GCC8.1.0
Compiler flags     : -O3
Memory location   : Please put data memory location here
                    (e.g. code in flash, data on heap etc)
seedcrc           : 0xe9f5
[0]crcclist       : 0x714
[0]crcmatrix      : 0x1fd7
[0]crcstate        : 0x8e3a
[0]crcfinal        : 0x65c5

```

581,1 97%

(3) 完成分支预测配置下的 dhystone 程序和 coremark 程序的仿真后，观察仿真结果，记录数据，汇总成上述的两张表格。

表格 1 dhystone 测试

|                         | all prediction on | all prediction off | BTB,L0BTB off | BPE off     |
|-------------------------|-------------------|--------------------|---------------|-------------|
| cycle                   | 1140992           | 3040575            | 1385950       | 2278724     |
| insts                   | 2040028           | 2040028            | 2040028       | 2040028     |
| CPI                     | 0.559302127       | 1.490457484        | 0.67937793    | 1.117006237 |
| conditional branch miss | 19                | 69999              | 19            | 69999       |
| indirect branch miss    | 0                 | 0                  | 0             | 0           |
| indirect branch inst    | 0                 | 0                  | 0             | 0           |
| DMIPS(dmips/MHz)        | 4.9912            | 1.8717             | 4.1231        | 2.5066      |

表格 2 coremark 测试

|                                  | all prediction on | all prediction off | BTB,L0BTB off | BPE off     |
|----------------------------------|-------------------|--------------------|---------------|-------------|
| cycle                            | 5684994           | 12875906           | 7050750       | 12568284    |
| insts                            | 9474454           | 9474454            | 9474454       | 9474454     |
| CPI                              | 0.600033944       | 1.359012984        | 0.744185364   | 1.326544411 |
| conditional branch miss          | 59995             | 682829             | 62141         | 682829      |
| indirect branch miss             | 5                 | 320                | 5             | 5           |
| indirect branch inst             | 320               | 320                | 320           | 320         |
| CoreMark point<br>(CoreMark/MHz) | 7.0361            | 3.1065             | 5.6721        | 3.1826      |

其中 CPI=cycle/insts

### 3、实验分析和总结

- (1) 在开启全部预测器的情况下，CPU 达到最佳性能；而在关闭全部预测器的情况下 CPU 的评分最低，这证明了分支预测器很有效。
- (2) 通过分别关闭 BTB 和 BPE 来测试 CPU 性能，发现 BPE 对性能的影响大于 BTB。
- (3) 由于 dhystone 可能受到内存及编译器性能的影响，所以计算得出的 CPI 与 coremark 有所区别。Dhystone 对于分支预测开关更为敏感，这也证明了分支预测主要针对编译器性能进行了优化。

### 4、实验收获、存在问题、改进措施或建议等

通过仿真实验，感受了分支预测器对 CPU 性能的直观影响，也感受了两种 CPU 测试方法的区别。