Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPR Placer runtime issue when all design clusters have fixed locations #2484

Open
rachelselinar opened this issue Feb 5, 2024 · 6 comments
Assignees

Comments

@rachelselinar
Copy link

VPR Placer has a large runtime when all the input design clusters have fixed locations.

Expected Behaviour

When all clusters are fixed, VPR's place stage should complete very fast as all the clusters are already placed.

Current Behaviour

When all clusters have fixed locations, VPR's placer takes very long time to complete. In contrast, when only IO/PLL clusters are fixed, VPR's placer using annealing completes much faster.

Example / Steps to Reproduce

Used the Titan23 benchmarks and all designs have similar behavior. Sharing example from 'gaussianblur' design to show how large the runtime difference can be.

(i) All cluster locations are fixed

Command:
_

vpr gaussianblur.400x296.stratixiv_arch.timing.xml gaussianblur.blif --place --route --timing_analysis on --route_chan_width 300 --max_router_iterations 400 --astar_fac 1 --verify_file_digests off --timing_report_detail aggregated --device 400x296 --sdc_file gaussianblur.sdc --fix_clusters gaussianblur.fix_clusters --net_file gaussianblur.net

_

Placement Snippet from vpr_stdout.log:

Moves per temperature: 2498181
Warning 567: Starting t: 0 of 105710 configurations accepted.


Tnum Time T Av Cost Av BB Cost Av TD Cost CPD sTNS sWNS Ac Rate Std Dev R lim Crit Exp Tot Moves Alpha
(sec) (ns) (ns) (ns)


1 106145.3 0.0e+00 1.000 90098.80 0.00054943 794.866 -7.11e+07 -793.866 0.000 0.0000 399.0 1.00 2498181 0.200
2 66540.4 0.0e+00 1.000 90098.80 0.00054943 794.866 -7.11e+07 -793.866 0.000 0.0000 399.0 1.00 4996362 0.950

Placement Quench took 66540.41 seconds (max_rss 22758.9 MiB)

post-quench CPD = 794.866 (ns)

BB estimate of min-dist (placement) wire length: 27029640

Completed placement consistency check successfully.

(Ii) Only IO/PLL cluster locations are fixed:

Command:
_

vpr gaussianblur.400x296.stratixiv_arch.timing.xml gaussianblur.blif --init_place_file gaussianblur.place --fix_clusters gaussianblur.io_pll.fix_clusters --place --route --timing_analysis on --route_chan_width 300 --max_router_iterations 400 --astar_fac 1 --verify_file_digests off --timing_report_detail aggregated --device 400x296 --sdc_file gaussianblur.sdc --net_file gaussianblur.net

_

Placement Snippet from vpr_stdout.log:

Moves per temperature: 2498181
Warning 567: Starting t: 801 of 105710 configurations accepted.


Tnum Time T Av Cost Av BB Cost Av TD Cost CPD sTNS sWNS Ac Rate Std Dev R lim Crit Exp Tot Moves Alpha
(sec) (ns) (ns) (ns)


1 123.9 1.4e-04 6.004 846618.76 0.0014513 799.617 -8.1e+07 -798.617 0.858 1.6785 399.0 1.00 2498181 0.200
2 126.7 1.2e-04 1.065 1045081.66 0.0010934 2501.707 -3.03e+08 -2500.707 0.972 0.0137 399.0 1.00 4996362 0.900
3 128.3 6.2e-05 0.973 1042872.91 0.0012203 3923.444 -3.05e+08 -3922.444 0.954 0.0036 399.0 1.00 7494543 0.500
4 129.4 5.6e-05 0.977 1040438.53 0.0010769 3697.168 -3.04e+08 -3696.168 0.948 0.0042 399.0 1.00 9992724 0.900
5 130.3 5.0e-05 0.988 1038464.20 0.0011874 3238.814 -3.27e+08 -3237.814 0.944 0.0021 399.0 1.00 12490905 0.900
6 130.2 4.5e-05 0.978 1035637.29 0.0010805 3219.246 -3.18e+08 -3218.246 0.941 0.0032 399.0 1.00 14989086 0.900
7 132.0 4.1e-05 0.983 1035091.99 0.0012707 3066.179 -3.16e+08 -3065.179 0.933 0.0023 399.0 1.00 17487267 0.900
8 132.2 3.7e-05 0.983 1033164.71 0.0011694 2955.221 -3.13e+08 -2954.221 0.928 0.0023 399.0 1.00 19985448 0.900
9 134.1 3.3e-05 0.980 1031057.35 0.0011307 2851.145 -2.93e+08 -2850.145 0.921 0.0034 399.0 1.00 22483629 0.900
10 134.6 3.0e-05 0.986 1029670.86 0.0012626 2423.375 -3.19e+08 -2422.375 0.915 0.0020 399.0 1.00 24981810 0.900
...
100 101.9 1.8e-07 0.998 105965.24 1.9476e-05 783.091 -1.03e+08 -782.091 0.318 0.0006 2.7 7.97 249818100 0.950
101 101.5 1.7e-07 0.998 105678.35 1.9439e-05 783.138 -1.02e+08 -782.138 0.302 0.0005 2.3 7.98 252316281 0.950
102 101.3 1.6e-07 0.999 105452.23 1.9483e-05 782.958 -1.02e+08 -781.958 0.284 0.0005 2.0 7.98 254814462 0.950
103 101.6 1.5e-07 0.998 105173.43 1.9406e-05 783.111 -1.02e+08 -782.111 0.430 0.0005 1.7 7.99 257312643 0.950
104 101.5 1.5e-07 0.999 104988.39 1.9386e-05 782.943 -1.02e+08 -781.943 0.412 0.0003 1.7 7.99 259810824 0.950
105 100.9 1.4e-07 0.999 104812.00 1.8895e-05 783.019 -1.02e+08 -782.019 0.397 0.0003 1.6 7.99 262309005 0.950
106 101.0 1.3e-07 0.999 104640.99 1.9355e-05 783.120 -1.02e+08 -782.120 0.379 0.0003 1.6 7.99 264807186 0.950
107 101.6 1.3e-07 0.999 104484.99 1.8941e-05 782.887 -1.02e+08 -781.887 0.364 0.0003 1.5 7.99 267305367 0.950
108 101.3 1.2e-07 0.999 104334.25 1.9334e-05 783.284 -1.02e+08 -782.284 0.346 0.0003 1.4 7.99 269803548 0.950
109 100.9 1.1e-07 0.999 104198.63 1.9255e-05 782.902 -1.02e+08 -781.902 0.329 0.0003 1.2 8.00 272301729 0.950
110 100.8 1.1e-07 0.999 104060.84 1.937e-05 783.105 -1.02e+08 -782.105 0.312 0.0003 1.1 8.00 274799910 0.950
111 100.7 1.0e-07 0.999 103926.18 1.9347e-05 782.882 -1.02e+08 -781.882 0.296 0.0002 1.0 8.00 277298091 0.950
112 100.2 9.8e-08 0.999 103813.07 1.9351e-05 783.188 -1.01e+08 -782.188 0.281 0.0002 1.0 8.00 279796272 0.950
113 100.0 9.3e-08 0.999 103700.57 1.9312e-05 783.090 -1.02e+08 -782.090 0.265 0.0002 1.0 8.00 282294453 0.950
114 99.5 8.8e-08 0.999 103600.40 1.9289e-05 783.259 -1.02e+08 -782.259 0.251 0.0002 1.0 8.00 284792634 0.950
115 99.3 8.4e-08 0.999 103496.22 1.9182e-05 782.929 -1.02e+08 -781.929 0.237 0.0002 1.0 8.00 287290815 0.950
116 99.2 7.9e-08 0.999 103395.91 1.9239e-05 783.180 -1.02e+08 -782.180 0.221 0.0002 1.0 8.00 289788996 0.950
117 99.0 7.6e-08 1.000 103315.76 1.9263e-05 782.944 -1.02e+08 -781.944 0.207 0.0002 1.0 8.00 292287177 0.950
118 98.7 7.2e-08 1.000 103233.56 1.9216e-05 783.056 -1.02e+08 -782.056 0.194 0.0002 1.0 8.00 294785358 0.950
119 98.8 6.8e-08 1.000 103162.01 1.9174e-05 783.219 -1.02e+08 -782.219 0.181 0.0002 1.0 8.00 297283539 0.950
120 98.6 6.5e-08 1.000 103093.70 1.9185e-05 783.123 -1.03e+08 -782.123 0.169 0.0001 1.0 8.00 299781720 0.950
121 98.6 6.2e-08 1.000 103027.63 1.9178e-05 783.001 -1.02e+08 -782.001 0.158 0.0001 1.0 8.00 302279901 0.950
122 98.3 5.8e-08 1.000 102973.50 1.9192e-05 783.163 -1.02e+08 -782.163 0.146 0.0001 1.0 8.00 304778082 0.950
123 97.7 4.7e-08 0.999 102855.63 1.913e-05 782.973 -1.02e+08 -781.973 0.111 0.0002 1.0 8.00 307276263 0.800
Agent's 2nd state:
Checkpoint saved: bb_costs=102820, TD costs=1.92158e-05, CPD=782.944 (ns)
124 96.8 3.7e-08 1.000 102746.76 1.9211e-05 782.944 -1.02e+08 -781.944 0.064 0.0002 1.0 8.00 309774444 0.800
125 96.6 3.0e-08 1.000 102678.19 1.9135e-05 783.157 -1.02e+08 -782.157 0.046 0.0001 1.0 8.00 312272625 0.800
126 96.6 2.4e-08 1.000 102633.81 1.9131e-05 782.944 -1.02e+08 -781.944 0.034 0.0001 1.0 8.00 314770806 0.800
127 96.2 1.9e-08 1.000 102604.47 1.9148e-05 783.063 -1.02e+08 -782.063 0.025 0.0001 1.0 8.00 317268987 0.800
128 96.1 1.5e-08 1.000 102590.45 1.9118e-05 782.944 -1.02e+08 -781.944 0.019 0.0000 1.0 8.00 319767168 0.800
Checkpoint saved: bb_costs=102588, TD costs=1.91182e-05, CPD=782.939 (ns)
129 96.0 1.2e-08 1.000 102581.34 1.9117e-05 782.939 -1.02e+08 -781.939 0.015 0.0000 1.0 8.00 322265349 0.800
130 95.9 9.8e-09 1.000 102575.79 1.9107e-05 783.129 -1.02e+08 -782.129 0.011 0.0000 1.0 8.00 324763530 0.800
Checkpoint saved: bb_costs=102576, TD costs=1.91221e-05, CPD=782.878 (ns)
131 96.0 7.8e-09 1.000 102572.58 1.9121e-05 782.878 -1.02e+08 -781.878 0.009 0.0000 1.0 8.00 327261711 0.800
132 95.9 6.3e-09 1.000 102571.26 1.9109e-05 783.129 -1.02e+08 -782.129 0.007 0.0000 1.0 8.00 329759892 0.800
133 95.7 5.0e-09 1.000 102569.54 1.9121e-05 782.939 -1.02e+08 -781.939 0.006 0.0000 1.0 8.00 332258073 0.800
134 95.6 0.0e+00 1.000 102569.82 1.9114e-05 783.129 -1.02e+08 -782.129 0.001 0.0000 1.0 8.00 334756254 0.800

Placement Quench took 95.62 seconds (max_rss 22746.4 MiB)

post-quench CPD = 782.939 (ns)

Checkpoint restored

BB estimate of min-dist (placement) wire length: 30772745

Completed placement consistency check successfully.

Inputs for the above example can be found in this shared drive.

Context

Trying to compare 'no refinement' vs 'refinement' of cluster locations in VPR.

Your Environment

  • VPR used: eb3c95d
  • Operating System and version: Ubuntu 20.04.6 LTS
@soheilshahrouz soheilshahrouz self-assigned this Feb 6, 2024
@soheilshahrouz
Copy link
Contributor

Thank you for opening this issue.

It seems that some move generators call pick_from_block() function to select a block randomly. This function has a while loop that tries to find a movable block. Since all blocks in your run are fixed, this loop's exit condition is not met until all possible options (all clustered blocks) are exhaused.

ClusterBlockId pick_from_block() {
/* Some blocks may be fixed, and should never be moved from their *
* initial positions. If we randomly selected such a block try *
* another random block. *
* *
* We need to track the blocks we have tried to avoid an infinite *
* loop if all blocks are fixed. */
auto& cluster_ctx = g_vpr_ctx.clustering();
auto& place_ctx = g_vpr_ctx.mutable_placement();
std::unordered_set<ClusterBlockId> tried_from_blocks;
//Keep selecting random blocks as long as there are any untried blocks
//Can get slow if there are many blocks but only a few (or none) can move
while (tried_from_blocks.size() < cluster_ctx.clb_nlist.blocks().size()) {
//Pick a block at random
ClusterBlockId b_from = ClusterBlockId(vtr::irand((int)cluster_ctx.clb_nlist.blocks().size() - 1));
//Record it as tried
tried_from_blocks.insert(b_from);
if (place_ctx.block_locs[b_from].is_fixed) {
continue; //Fixed location, try again
}
//Found a movable block
return b_from;
}
//No movable blocks found
return ClusterBlockId::INVALID();
}

If you want to skip placement altogether, I guess you can pass the placement file using --place_file option.

@vaughnbetz What is you opinion on this? Should I change pick_from_block() function? We can fix this by trying only a few clustered blocks to find a movable one. Alternatively, we can store all movable blocks in a separate container and select an element of this container randomly.

@vaughnbetz
Copy link
Contributor

Thanks @soheilshahrouz . Probably we should put all the movable blocks (and only movable blocks) in a container and select them randomly, so we are always efficient.

@rachelselinar
Copy link
Author

rachelselinar commented Feb 12, 2024

I was able to convert a .fix_clusters file to a .place file by including '0' subtile entry for all entries and appending netlist checksum and array information as a header.

By passing the .place file using the --place_file option, router ran successfully (and much faster) and the results are the same.

Thank you

@vaughnbetz
Copy link
Contributor

Great, thanks Rachel. I think without the checksum it will just give a warning, so if that's a pain to maintain I think you can skip it (it's intended to protect people from accidentally using a .place file for the wrong circuit or architecture).

@rachelselinar
Copy link
Author

I was able to obtain the netlist checksum and array size information from runs that did not fix all clusters in the .fix_clusters file. In addition, VPR placer errors out if the .place file doesn't have these 3 lines in its header

  1. Netlist checksum
  2. Array size

Without either (or 3 empty lines as header), placer errors out at the first entry:
_

Error 1:
Type: Placement file
File: ../gaussianblur.place
Line: 3
Message: Invalid line 'step0:grp_step0_fu_168|step0_grp_fu_2167_ACMP_fmul_6:step0_grp_fu_2167_ACMP_fmul_6_U|ACMP_fmul:ACMP_fmul_U|AESL_WP_FMul:ACMP_FMul_U|lpm_mult:Mult0|mult_8at:auto_generated|mac_mult2 166 126 0 0 #0' in placement file header

_

  1. Empty line

If only the checksum and array size is provided without an empty line, the placer errors out:

_

Error 1:
Type: Placement
File: ~/vtr-verilog-to-routing/vpr/src/base/read_place.cpp
Line: 297
Message: Block 0 has not been read from the place file.

_

For gaussianblur, the .place file generated by VPR placer contains this header:

Netlist_File: gaussianblur.net Netlist_ID: SHA256:d6e68433c9959dc959dc0758f8f9f026cd79b00d8fbbfce6c211b820e50589dd
Array size: 400 x 296 logic blocks

#block name x y subblk layer block number
#---------- -- -- ------ ----- ------------

@vaughnbetz
Copy link
Contributor

Thanks; we should probably make this a bit more robust (keep parsing and give a warning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants