



# Seven Steps to an Accurate Worst-Case Power Analysis using the Xilinx Power Estimator

XAPP1348 (v1.0) November 16, 2020

## Summary

Power and thermal specifications for FPGAs, MPSoCs, and RFSoCs must be determined in the early stages of a product design cycle, often even when the register transfer level (RTL) description is incomplete. The Xilinx® Power Estimator (XPE) can be used during the pre-design and pre-implementation stages of the design cycle for power analysis.

With the increasing complexity and compute power of modern programmable devices, an early analysis of power and thermal specifications is a must for a successful product design. The ability to estimate power consumption in a design is imperative for efficient part selection, board design, and system reliability. An accurate worst-case analysis early on helps you avoid the pitfalls of over designing or under designing the power or thermal portions of your system. This application note describes a seven-step procedure for analyzing the power requirements of your design using the [Xilinx Power Estimator \(XPE\)](#) spreadsheet tool.

## Introduction

As a necessary step in any board design, power and thermal specifications need to be properly set to create a functioning and reliable system. In most cases, these power and thermal specifications should be set prior to PCB design and, due to the flexibility of Xilinx devices, often the FPGA, MPSoC, or RFSoC design is not complete or not even started prior to system design and/or PCB fabrication. This presents an interesting challenge for system designers, because power and thermal characteristics can vary dramatically depending on the bitstream (design), clocking, and data put into the device. Underdesigning the power or thermal system can make the device operate out of specification, which could result in the device not operating at the expected performance or potentially other more serious consequences including reliability and lifetime degradation. Overdesigning the power or thermal solution is generally less serious (although not desirable) because it can add unnecessary cost, size, weight, complexity, and potential time to the overall device design.

Prior to completing a design, power estimation is not trivial. The purpose of this application note is to simplify this task by splitting it into seven steps to achieve accurate worst-case power estimation prior to design completion and is primarily focused on power analysis.

## Step 1: Updating to the Most Current XPE Version

Obtain the latest version of the Xilinx® Power Estimator (XPE) for the selected target device. It is important to make sure you are using the latest version of the XPE tool because power information is updated periodically to reflect the latest power modeling and characterization data. See the [Xilinx Power Estimator \(XPE\)](#) spreadsheet tool.

It is also helpful to check this web site occasionally during the design process to determine whether a newer version is available. You can import the data from a previous version of XPE using the updated XPE by selecting the **Import File** button on the Summary tab. Keeping XPE updated enables your design cycle to have the most accurate power information for system power analysis.

Figure 1: Importing an Existing XPE Sheet



## Step 2: Complete the Device Information on the Summary Tab

Make sure to set each field in the **Device** section of the Summary tab because each field can have a significant effect on the final power calculation, particularly the static and clocking power. The following steps describe the choices in the tool.

*Figure 2: Device Information—Summary Tab*



1. **Select Family:** Choose the appropriate family from the drop-down menu. XC, XA, and XQ families are available.
2. **Select Device:** Choose the smallest device that meets your requirements. An improperly chosen device can lead to incorrect static and dynamic power estimations such as the dynamic power reported for clocks and logic. An incorrect device setting can also result in improperly reported available resources.
3. **Select Package:** The package selection can affect the heat dissipation and the end junction temperature of the chosen device. An incorrect junction temperature can result in an incorrect static power calculation. The package selection also impacts the maximum current specification and a notification is triggered when the maximum is exceeded. Certain Xilinx device families are available in lidless packages (denoted by an S or B in the package name), these offer the lowest thermal resistance to the user thermal solution, the junction temperature of *any* device depends on your thermal solution. The only way to understand the effectiveness of the thermal solution in the end environment is to run a thermal simulation. Xilinx provides DELPHI thermal models for Siemens FloTHERM and Ansys Icepak. These models are available for download on the Xilinx website under the [Device Models](#) tab.
4. **Select Speed Grade:** Choose the appropriate device speed grade for your design. Some devices are offered at a lower  $V_{CCINT}$  voltage operation. The factors to consider when choosing these devices include reduced performance and lower power. These devices are indicated in the tool as -2L (0.72V) or -1L (0.72V).
5. **Select Temp Grade:** Choose the appropriate temperature grade for the device. Typically, the selection is **Commercial**, **Extended** or **Industrial**. This setting allows for the proper display of junction temperature limits for the chosen device. Some devices offer an excursion temperature operation, which raises the upper temperature operating limit to 110°C for a limited period. Refer to *Extending the Thermal Solution by Utilizing Excursion Temperatures* ([WP517](#)) for further information.
6. **Select Process:** For the purposes of a worst-case analysis, the recommendation is to set the process to **Maximum**. While the default setting of (**Typical**) results in a statistically accurate measurement, by changing the setting to **Maximum** the power specification is modified to worst-case values. Setting the **Maximum** process makes sure that the power and thermal delivery solution will work with any device that will be shipped.



7. **Characterization:** This read-only field shows you the characterization level of the data, the % error for each of the characterization level is as follows:
  - Preview ( $\pm 30\%$  accuracy)
  - Advance ( $\pm 25\%$  accuracy)
  - Preliminary ( $\pm 20\%$  accuracy)
  - Production ( $\pm 15\%$  accuracy)



**TIP:** The characterization accuracy level describes the accuracy of the model within XPE. The accuracy of the estimation depends on the accuracy of the information added by the user.

## Step 3: Complete the Environment and Implementation Information on the Summary Tab

Set the proper thermal conditions in the environment section because they are important when calculating static power.

*Figure 3: Environment and Implementation Information – Summary Tab*



1. **Set Junction Temperature:** This field is used to force the device junction temperature to a specific value. For worst-case analysis, select **User Override** to change the maximum  $T_J$ , which is 100°C for the E and I temperature grade devices. With the process set to **Maximum** and the maximum  $T_J$  allowed for the temperature grade, the worst-case power is determined and the thermal design deploys an adequate thermal solution. When a low power design is required, improving the thermal solution further to reduce  $T_J$  will directly reduce power by lowering the static power component. For low-power designs and to improve power consumption, target a temperature below the maximum  $T_J$ .

| Environment          |                                                   |                  |
|----------------------|---------------------------------------------------|------------------|
| Junction Temperature | <input checked="" type="checkbox"/> User Override | 100.0 °C         |
| Ambient Temp         |                                                   | 97.3 °C          |
| Effective ΘJA        | <input type="checkbox"/> User Override            |                  |
| Airflow              |                                                   |                  |
| Heat Sink            |                                                   | Medium Profile   |
| ΘSA                  |                                                   | 104.2 °C/W       |
| Board Selection      |                                                   | Medium (10"x10") |
| # of Board Layers    |                                                   |                  |
| ΘJB                  |                                                   |                  |
| Board Temperature    |                                                   |                  |

Overriding the Junction Temperature causes all the fields to be grayed out (except the Heat Sink and Board Selection fields). Use Ambient Temp and Effective ΘJA when the values are derived from thermal simulations.



**RECOMMENDED:** While there are additional environmental settings available, Xilinx recommends performing a thermal simulation to determine the Effective ΘJA. Xilinx provides DELPHI models for FloTHERM and Icepak at [Downloads](#).

- Set **Power Optimization**: Set the power optimization to **Default** to be inline with the default Vivado® Design Suite option. The following image shows the default power optimization setting for worst-case analysis in the Xilinx Power Estimator.



## Step 4: Set Worst-case Voltage on All Supplies

By default, each voltage rail for a device is set to its nominal value in the Xilinx Power Estimator tool. To get a worst-case power estimation, the maximum DC offset of the regulator (in general is 1% higher) need to be specified. If you are not using some of the V<sub>CCO</sub> or MGT voltage sources, keep the default values in those source specific rows.

**Figure 4: Power Supply Settings in the Xilinx Power Estimator Tool with Maximum Voltage Settings**



**TIP:** Use nominal voltages for power delivery design. This allows for enough margin (positive and negative) to design an appropriate power supply. If the maximum or minimum levels are used, ANY ripple on the supplies could mean a violation of the operating specifications.

## Step 5: Enter Clock and Resource Information

If your design has already been through the Vivado tools or a previous revision of the design was run, those versions are a good starting point for the analysis. The output file (\*.xpe) from Vivado Report\_Power can be imported into the Xilinx Power Estimator tool to help fill out clock and resource information. To do this, use the Import File option located in the Summary tab of the Xilinx Power Estimator tool.

**Figure 5: Importing Vivado Report Power Output File into the Xilinx Power Estimator Tool**



After importing the \*.xpe file, additional information and adjustments are often needed to create a complete estimation. For each of the resource tabs, fill out the expected resources to be used in the design. Refer to *Xilinx Power Estimator User Guide (UG440)* for detailed information on specific fields in the individual tabs.



**TIP:** When starting a new estimation, using a previous design that was correlated with hardware is the best starting point to give accurate toggle and switching rates.

## Clock Tree Power

Enter the various clocks and their frequencies in individual rows. For dynamic power calculations, the important factors to consider are activity and load capacitance being switched by each clock network in the design. Some factors for determining load capacitance are fanout and wire length. Typically, clock nets have higher activity and fanouts, which makes the values entered important. Fanout is managed in [Step 6](#). During early power estimation calculations, the default Fanout/Site value is recommended. For imported \*.xpe files, the fanout value is provided by Vivado and is based on the place and route results used to improve clock power accuracy.

*Figure 6: Clock Tree Power Tab*

The screenshot shows the 'Clock Tree Power' tab. At the top, there's a summary table with V<sub>CCINT</sub> at 0.720V and Power at 0.000W, indicating 0% of total on-chip power. To the right are links to 'Clocking Resources User Guide', 'XPE User Guide', and 'Introduction to XPE (video)'. Below the summary is a detailed table with columns for Name, Frequency (MHz), Fanout, Fanout/Site, Clock Buffer Enable, Slice Clock Enable, and Power (W). The table lists three clock domains: System Logic Clock, Auxiliary Logic Clock, and Low Power Logic Clock, each with a frequency of 300.0 MHz, 200.0 MHz, and 100.0 MHz respectively, and a fanout of 0.

| Name                  | Frequency (MHz) | Fanout | Fanout/Site | Clock Buffer Enable | Slice Clock Enable | Power (W) |
|-----------------------|-----------------|--------|-------------|---------------------|--------------------|-----------|
| System Logic Clock    | 300.0           | 0      | 6.5         | 100%                | 100%               | 0.000     |
| Auxiliary Logic Clock | 200.0           | 0      | 6.5         | 100%                | 50%                | 0.000     |
| Low Power Logic Clock | 100.0           | 0      | 6.5         | 100%                | 50%                | 0.000     |

## Logic Power

Use this tab to enter an estimate for the number of LUT resources configured as Logic, Shift Registers, and Distributed RAM Registers used in the design. Use the Add Memory button to simplify adding distributed memory to the design. Use different rows to separate the logic based on the clock domain they operate in. In the absence of a better estimate for your design, leave the Toggle Rate and Routing Complexity at the default values.

*Figure 7: Logic Power Tab*

The screenshot shows the 'Logic Power' tab. At the top, there's a summary table with V<sub>CCINT</sub> at 0.720V and Power at 0.431W, indicating 36% of total on-chip power. To the right are links to 'CLB User Guide', 'XPE User Guide', and 'Introduction to XPE (video)'. Below the summary is a utilization table with columns for Registers, LUTs, Combinatorial, Shift Registers, and Distributed RAMs, showing values like 18,000 registers and 57,700 LUTs. At the bottom is a detailed table with columns for Name, Clock (MHz), and LUTs as Logic, Shift Registers, Distributed RAMs, Registers, Toggle Rate, Routing Complexity, Signal Rate (Mtr/s), and Power (W). The table lists three logic domains: System Logic, Auxiliary Logic, and Low Power Logic, each with a clock frequency of 300.0 MHz, 200.0 MHz, and 100.0 MHz respectively, and a toggle rate of 12.5%.

| Name            | Clock (MHz) | LUTs as | Registers       | Toggle Rate      | Routing Complexity | Signal Rate (Mtr/s) | Power (W) |      |       |
|-----------------|-------------|---------|-----------------|------------------|--------------------|---------------------|-----------|------|-------|
|                 |             | Logic   | Shift Registers | Distributed RAMs | Registers          |                     |           |      |       |
| System Logic    | 300.0       | 25000   | 1000            | 4000             | 10000              | 12.5%               | 10.00     | 37.5 | 0.287 |
| Auxiliary Logic | 200.0       | 15000   | 500             | 3000             | 5000               | 12.5%               | 10.00     | 25.0 | 0.115 |
| Low Power Logic | 100.0       | 8000    | 200             | 1000             | 3000               | 12.5%               | 10.00     | 12.5 | 0.029 |
|                 |             |         |                 |                  |                    | 12.5%               | 10.00     | 0.0  | 0.000 |
|                 |             |         |                 |                  |                    | 12.5%               | 10.00     | 0.0  | 0.000 |

During the early stages of design, it can be difficult to get accurate numbers for these resources. It is best to work with large round numbers that are close to a realistic estimation. A good practice is to consider the data entered into the Xilinx Power Estimator tool early as a constraint to the design. If you specify 25,000 LUTs for a portion of the design, then pay attention to that portion to be sure it stays within budget. If it grows beyond the initial power margin budget, then it is possible to take early action in the design. By making the budget parameters less of a guessing game and more of a guidance for the design, the early resource estimates are more controllable. If the design has an earlier revision, use it as the starting point by using the Import feature to populate the fields and build upon that base design.



**TIP:** When entering the clock frequency information, use the capabilities of Excel to relate that cell to the cell populated in the Clock Tree Power tab. To do this, select the desired **Clock (MHz)** cell in the logic view, type =, and select the cell in the **Clock Tree Power** tab corresponding to the clock source for that logic. The cell is populated with the value in the Clock Tree Power tab. The primary benefit of this methodology is that if the clock frequency is ever changed, either by a specification or when exploring power trade-offs vs. frequency, the value is only updated in one place and can be reflected throughout the Xilinx Power Estimator tool. This methodology can also reduce the chance of errors and inconsistencies during the data entry.

## I/O Power

With faster switching speeds and higher capacitive loads, I/O power can be a substantial part of the total power consumption of the device. Therefore, it is important to accurately define and fill in the I/O related parameters to get an accurate overall estimation of all rails of the device. Depending on the selected I/O standard and I/O circuitry, a significant amount of power can be consumed not only in the  $V_{CCO}$  rail but also the  $V_{CCINT}$  and  $V_{CCAUX}$  rails. By specifying each device interface separately and breaking out the interface signals to the data, control, and clock signals, it is easier to provide different I/O standards as well as other I/O characteristics, such as load and toggle rates.



**TIP:** Use the Add Memory Interface button to add a memory interface into the I/O spreadsheet.

Figure 8: I/O Power Tab

| Bank                      |          | I/O Settings          |            |             |           |           |           |            |                  | Activity     |           |             |           | On Chip Power (W) |              |             |              | External |                    | Off Chip        |                 |             |           |                      |           |           |        |
|---------------------------|----------|-----------------------|------------|-------------|-----------|-----------|-----------|------------|------------------|--------------|-----------|-------------|-----------|-------------------|--------------|-------------|--------------|----------|--------------------|-----------------|-----------------|-------------|-----------|----------------------|-----------|-----------|--------|
| Name                      | I/O Type | I/O Standard          | Input PIns | Output PIns | Bidr PIns | BTSLICE   | IBUF      | Input Term | Output Impedance | Pre Emphasis | Clock MHz | Toggle Rate | Data Rate | Output Enable     | Term Disable | IDB Disable | DIBL Disable | (pF)     | Signal Rate (Mtrs) | $V_{CCINT\_IO}$ | $V_{CCAUX\_IO}$ | $V_{CCINT}$ | $V_{CCO}$ | External Termination | $V_{CCO}$ | All rails |        |
| System Logic Clock        | HP       | LVDS 1.8V (pair)      | 1          | No          | High Perf | IFF_TERM  |           | 300.0      | 200.0%           | SDR          | 0.0%      | 0.0%        | 600.0     | 0.000             | 0.003        | 0.000       | 0.001        | 100      | 0.001              | -0.001          | -0.001          | -0.001      | None      | -0.001               | -0.001    |           |        |
| System Logic Control      | HP       | SSTL Class 1 DCI 1.8V | 20         | 30          | No        | High Perf | RDRV_40   | 300.0      | 12.5%            | SDR          | 50.0%     | 0.0%        | 5         | 37.5              | 0.014        | 0.027       | 0.000        | 40       | 0.00               | -0.005          | -0.005          | -0.005      | None      | -0.005               | -0.005    |           |        |
| System Logic Data         | HP       | SSTL Class 1 DCI 1.8V | 40         | 40          | 20        | Yes       | High Perf | RTT_40     | RDRV_40          | 300.0        | 12.5%     | SDR         | 75.0%     | 0.0%              | 5            | 37.5        | 0.008        | 0.247    | 0.001              | 40              | 0.00            | -0.13       | -0.13     | -0.13                | None      | -0.13     | -0.13  |
| Auxiliary Logic Clock     | HP       | LVDS 1.8V (pair)      | 1          | No          | High Perf | IFF_TERM  |           | 200.0      | 200.0%           | SDR          | 0.0%      | 0.0%        | 400.0     | 0.000             | 0.003        | 0.000       | 0.001        | 100      | 0.001              | -0.001          | -0.001          | -0.001      | None      | -0.001               | -0.001    |           |        |
| Auxiliary Logic Control   | HP       | SSTL 1.5V             | 10         | 10          | No        | High Perf | RTT_48    | RDRV_48    | 200.0            | 12.5%        | SDR       | 100.0%      | 0.0%      | 5                 | 25.0         | 0.007       | 0.054        | 0.000    | 48                 | 0.00            | -0.013          | -0.013      | -0.013    | None                 | -0.013    | -0.013    |        |
| Auxiliary Logic Data      | HP       | SSTL 1.5V             | 15         | 15          | 8         | No        | High Perf | RTT_48     | RDRV_48          | 200.0        | 12.5%     | SDR         | 75.0%     | 0.0%              | 5            | 25.0        | 0.011        | 0.052    | 0.000              | 48              | 0.00            | -0.029      | -0.029    | -0.029               | None      | -0.029    | -0.029 |
| Low Power Logic Clock     | HP       | LVCMSOS 1.8V 8mA      | 1          | No          | Low Power |           |           | 100.0      | 200.0%           | SDR          | 0.0%      | 0.0%        | 200.0     | 0.000             | 0.000        | 0.000       | 0.000        | None     | 0.000              | -0.000          | -0.000          | -0.000      | None      | -0.000               | -0.000    |           |        |
| Low Power Logic Control   | HP       | LVCMSOS 1.8V 8mA      | 4          | 4           | No        | Low Power |           | 100.0      | 12.5%            | SDR          | 100.0%    | 0.0%        | 5         | 12.5              | 0.000        | 0.001       | 0.000        | 40       | 0.00               | -0.000          | -0.000          | -0.000      | None      | -0.000               | -0.000    |           |        |
| Low Power Logic Data      | HP       | LVCMSOS 1.8V 8mA      | 8          | 8           | No        | Low Power |           | 100.0      | 12.5%            | SDR          | 100.0%    | 0.0%        | 5         | 12.5              | 0.003        | 0.000       | 0.003        | 40       | 0.00               | -0.000          | -0.000          | -0.000      | None      | -0.000               | -0.000    |           |        |
| DDR4_x16/ddr4_ck          | HP       | DDR SSTL 1.2V (pair)  | 1          | Yes         | Low Power | RDRV_40   |           | 1200.0     | 100.0%           | Clock        | 100.0%    | 0.0%        | 5         | 2400.0            | 0.007        | 0.035       | 0.001        | 40       | 0.005              | 0.005           | 0.005           | 0.005       | None      | -0.005               | -0.005    |           |        |
| DDR4_x16/ddr4_addr/ba/bog | HP       | SSTL 1.2V             | 18         | Yes         | Low Power | RDRV_40   | Yes       | 1200.0     | 20.0%            | SDR          | 100.0%    | 0.0%        | 5         | 240.0             | 0.042        | 0.036       | 0.001        | 40       | 0.024              | 0.024           | 0.024           | 0.024       | None      | -0.024               | -0.024    |           |        |
| DDR4_x16/ddr4_dq<0:15>    | HP       | POD DCI 1.2V          | 16         | Yes         | High Perf | DCI 400   | RDRV_40   | 1200.0     | 35.0%            | DDR          | 50.0%     | 0.0%        | 5         | 840.0             | 0.000        | 0.089       | 0.000        | 40       | 0.039              | 0.039           | 0.039           | 0.039       | None      | -0.039               | -0.039    |           |        |
| DDR4_x16/ddr4_dq>15:31>   | HP       | POD DCI 1.2V (pair)   | 2          | Yes         | Low Power | DCI 400   | RDRV_40   | 1200.0     | 35.0%            | DDR          | 50.0%     | 0.0%        | 5         | 840.0             | 0.011        | 0.141       | 0.001        | 40       | 0.06               | 0.06            | 0.06            | 0.06        | None      | -0.06                | -0.06     |           |        |
| DDR4_x16/ddr4_db<x>1>     | HP       | POD DCI 1.2V          | 2          | Yes         | Low Power | DCI 400   | RDRV_40   | 1200.0     | 35.0%            | DDR          | 50.0%     | 0.0%        | 5         | 840.0             | 0.006        | 0.009       | 0.000        | 40       | 0.015              | 0.015           | 0.015           | 0.015       | None      | -0.015               | -0.015    |           |        |
| DDR4_x16/ddr4_sys_clk     | HP       | DDR SSTL 1.2V (pair)  | 1          | No          | Low Power | RDRV_40   |           | 300.0      | Clock            | 100.0%       | 0.0%      | 0.0%        | 600.0     | 0.000             | 0.003        | 0.000       | 40           | 0.006    | 0.006              | 0.006           | 0.006           | None        | -0.006    | -0.006               |           |           |        |
| DDR4_x16/ddr4_ce/odt/cs/a | HP       | SSTL 1.2V             | 4          | Yes         | Low Power | RDRV_40   |           | 1200.0     | 12.5%            | SDR          | 100.0%    | 0.0%        | 5         | 150.0             | 0.009        | 0.005       | 0.000        | 40       | 0.005              | 0.005           | 0.005           | 0.005       | None      | -0.005               | -0.005    |           |        |
| DDR4_x16/ddr4_ce/odt/cs/a | HP       | LVCMSOS 1.8V 12mA     | 1          | No          | Low Power | RDRV_40   |           | 12.5%      | SDR              | 100.0%       | 0.0%      | 0.0%        | 5         | 0.0               | 0.000        | 0.000       | 0.000        | 40       | 0.000              | 0.000           | 0.000           | 0.000       | None      | -0.000               | -0.000    |           |        |

Differential pins are defined in pairs and are specified as a single entry. For example, if there is one differential input signal and four differential output signals using the LVDS 1.8V I/O standard, enter a 1 in the input column and 4 in the output column for that module name.

## Block RAM Power

To set the number of block RAMs to be used in the design and the configuration and to accurately set the block RAM parameters, a good understanding of device resources and configuration possibilities is recommended. Make sure to adjust the **Enable Rate** for Port A or Port B because the amount of time the RAM is enabled is directly proportional to the dynamic power it consumes. Entering the proper value for this parameter is important for an accurate block RAM power estimation.

Use the Add Memory button to add the various types of block RAMs with the required configurations as rows in the Block RAM spreadsheet. Refer to the Using the Block RAM spreadsheet in the *Xilinx Power Estimator User Guide* ([UG440](#)) for guidelines.

*Figure 9: Block RAM Power Tab*

| Power                            |            | Utilization         |           | Block RAM Power |             |             |                             |            |              |             |             |           |             |              |                     |                    |                     |  |
|----------------------------------|------------|---------------------|-----------|-----------------|-------------|-------------|-----------------------------|------------|--------------|-------------|-------------|-----------|-------------|--------------|---------------------|--------------------|---------------------|--|
| V <sub>CCINT</sub>               | 0.720V     | V <sub>CCBRAM</sub> | 0.154W    | RAMB18          | 250         | 17%         | Memory Resources User Guide |            |              |             |             |           |             |              |                     |                    |                     |  |
| V <sub>CCINT</sub>               | 0.850V     | V <sub>CCBRAM</sub> | 0.009W    | RAMB36          | 100         | 14%         | XPE User Guide              |            |              |             |             |           |             |              |                     |                    |                     |  |
| 3% of total on-chip power 4.894W |            |                     |           |                 |             |             |                             |            |              |             |             |           |             |              |                     |                    |                     |  |
| Name                             | Block RAMs | Cascade Group Size  | Mode      | Toggle Rate     | Clock (MHz) | Enable Rate | Bit Width                   | Write Mode | Write Enable | Clock (MHz) | Enable Rate | Bit Width | Write Mode  | Write Enable | Signal Rate (Mtr/s) | V <sub>CCINT</sub> | V <sub>CCBRAM</sub> |  |
| System Logic BRAM                | 200        | 10                  | RAMB18SDP | 12.5%           | 300.0       | 50.0%       | 36                          | NO_CHANGE  |              | 300.0       | 50.0%       | 36        | WRITE_FIRST | 12.5%        | 37.500              | 0.088              | 0.004               |  |
| Auxiliary Logic BRAM             | 100        | 4                   | RAMB36    | 12.5%           | 200.0       | 25.0%       | 1                           | READ_FIRST | 12.5%        | 200.0       | 25.0%       | 1         | NO_CHANGE   | 12.5%        | 25.000              | 0.060              | 0.004               |  |
| Low Power Logic BRAM             | 50         | 4                   | RAMB18    | 12.5%           | 100.0       | 12.5%       | 1                           | NO_CHANGE  | 12.5%        | 100.0       | 12.5%       | 1         | NO_CHANGE   | 12.5%        | 12.500              | 0.006              | 0.000               |  |
|                                  |            | 4                   | RAMB18    | 12.5%           | 25.0%       | 1           | NO_CHANGE                   | 12.5%      |              | 25.0%       | 1           | NO_CHANGE | 12.5%       | 0.000        | 0.000               | 0.000              |                     |  |
|                                  |            | 4                   | RAMB18    | 12.5%           | 25.0%       | 1           | NO_CHANGE                   | 12.5%      |              | 25.0%       | 1           | NO_CHANGE | 12.5%       | 0.000        | 0.000               | 0.000              |                     |  |

## UltraRAM Power

UltraScale+™ devices support a high-density 288 Kb memory block (UltraRAM) that coexists with the block RAMs and enables deeper memory implementation. Dedicated routing in an UltraRAM column enables the entire column height to be connected.

Set the number and configurations of the URAM with attention to the **Enable Rate** for Port A or Port B because it is directly proportional to the dynamic power it consumes.

*Figure 10: UltraRAM Power Tab*

| Power                            |        | Utilization         |         | UltraRAM Power |            |                     |                             |                    |             |            |             |              |            |             |              |                             |                     |  |
|----------------------------------|--------|---------------------|---------|----------------|------------|---------------------|-----------------------------|--------------------|-------------|------------|-------------|--------------|------------|-------------|--------------|-----------------------------|---------------------|--|
| V <sub>CCINT</sub>               | 0.720V | V <sub>CCBRAM</sub> | 0.116W  | URAM288        | 41         | 13%                 | Memory Resources User Guide |                    |             |            |             |              |            |             |              |                             |                     |  |
| V <sub>CCINT</sub>               | 0.850V | V <sub>CCBRAM</sub> | 0.001W  | XPE User Guide |            |                     |                             |                    |             |            |             |              |            |             |              | Introduction to XPE (video) |                     |  |
| 3% of total on-chip power 5.058W |        |                     |         |                |            |                     |                             |                    |             |            |             |              |            |             |              |                             |                     |  |
| Name                             | URAMs  | Cascade Group Size  | Latency | Mode           | Sleep Rate | Avg Inactive Cycles | Input Toggle Rate           | Output Toggle Rate | Clock (MHz) | Data Width | Enable Rate | Write Enable | Data Width | Enable Rate | Write Enable | V <sub>CCINT</sub>          | V <sub>CCBRAM</sub> |  |
| System Logic URAM                | 24     | 4                   | 1       | URAM288        | 0.0%       | 10                  | 12.5%                       | 12.5%              | 300.0       | 72         | 50.0%       | 12.5%        | 72         | 50.0%       | 12.5%        | 0.080                       | 0.001               |  |
| Auxiliary Logic URAM             | 12     | 2                   | 1       | URAM288        | 0.0%       | 10                  | 12.5%                       | 12.5%              | 200.0       | 72         | 25.0%       | 12.5%        | 72         | 25.0%       | 12.5%        | 0.029                       | 0.000               |  |
| Low Power Logic URAM             | 5      | 1                   | 0       | URAM288        | 0.0%       | 10                  | 12.5%                       | 12.5%              | 100.0       | 72         | 25.0%       | 12.5%        | 72         | 25.0%       | 12.5%        | 0.007                       | 0.000               |  |
|                                  |        | 1                   | 0       | URAM288        | 0.0%       | 10                  | 12.5%                       | 12.5%              |             | 72         | 25.0%       | 12.5%        | 72         | 25.0%       | 12.5%        | 0.000                       | 0.000               |  |

## DSP48 Power

Complete the DSP48 Power tab with the required details. DSP blocks can be used for multiplier, counters, filters, and other common functions.

Figure 11: DSP48 Power Tab



**TIP:** The default DSP configuration is assumed to be 27x18 in XPE. The toggle rate must be scaled accordingly for accurate power estimation. For example, if a 18x18 DSP is expected to toggle 25%, then scale it by 0.8 (20%), and enter it into XPE. Similarly, scale the actual toggle rate by 0.53 for a 12x12 configuration. The 0.8 scaling factor is obtained as follows:  $1 - ((27 + 18) - (18 + 18)) / (27 + 18) = 1 - 9/45 = 0.8$ .

| Power                             |            | Utilization |             |            | DSP48 User Guide |               |                     |                |
|-----------------------------------|------------|-------------|-------------|------------|------------------|---------------|---------------------|----------------|
| V <sub>CCINT</sub>                | 0.720V     | 0.070W      |             | DSP48      | 74               | 3%            |                     | XPE User Guide |
| 1% of total on-chip power 5.129W  |            |             |             |            |                  |               |                     |                |
| Name                              | DSP Slices | Clock (MHz) | Toggle Rate | MULT Used? | MREG Used?       | Pre-add Used? | Signal Rate (Mtr/s) | Power (W)      |
| Multiplier with pipeline register | 30         | 300.0       | 12.5% Yes   | Yes        | No               |               | 37.500              | 0.026          |
| Multiplier accumulate             | 30         | 300.0       | 12.5% Yes   | Yes        | No               |               | 43.125              | 0.029          |
| Filter                            | 14         | 200.0       | 25.0% Yes   | Yes        | No               |               | 50.000              | 0.014          |
|                                   |            |             | 12.5% Yes   | Yes        | No               |               | 0.000               | 0.000          |

## Clock Manager Power

If an MMCM/PLL is used in the design, enter the corresponding use and configuration of each. During the early stages of the design, the complete clocking details might not be known. Enter what is known to estimate the power and revisit as the design progresses.

Figure 12: Clock Manager Power Tab

| Power                            |                 | Clock Tree Power |             |                     |                    |           | Clocking Resources User Guide |                             |
|----------------------------------|-----------------|------------------|-------------|---------------------|--------------------|-----------|-------------------------------|-----------------------------|
| V <sub>CCINT</sub>               | 0.720V          | 0.000W           |             |                     |                    |           | XPE User Guide                |                             |
| 0% of total on-chip power 1.209W |                 |                  |             |                     |                    |           |                               | Introduction to XPE (video) |
| Name                             | Frequency (MHz) | Fanout           | Fanout/Site | Clock Buffer Enable | Slice Clock Enable | Power (W) |                               |                             |
| System Logic Clock               | 300.0           | 0                | 6.5         | 100%                | 100%               | 0.000     |                               |                             |
|                                  |                 |                  | 6.5         | 100%                | 50%                | 0.000     |                               |                             |
| Auxillary Logic Clock            | 200.0           | 0                | 6.5         | 50%                 | 100%               | 0.000     |                               |                             |
|                                  |                 |                  | 6.5         | 100%                | 50%                | 0.000     |                               |                             |
| Low Power Logic Clock            | 100.0           | 0                | 6.5         | 100%                | 50%                | 0.000     |                               |                             |
|                                  |                 |                  | 6.5         | 100%                | 50%                | 0.000     |                               |                             |
|                                  |                 |                  | 6.5         | 100%                | 50%                | 0.000     |                               |                             |

## Transceiver Power

Depending on the device selected, the transceiver can be a GTH, GTY, GTM. The best way to fill in the sheet is to use the **Add GTx Interface** button (where x=H, Y, or M).

XPE calculates power for each channel including the power of all associated circuits, shared resources between channels, I/O buffers, reference clock circuitry, and so forth. Therefore, the use of these resources should not be entered on any other sheet (for example, clock or I/O) to describe the transceiver resources.

Figure 13: GTY Transceiver Power Tab



## Other Block Power

Complete this sheet with the details and information on the SYSMON and configuration blocks.

Figure 14: Other Block Power Tab



**TIP:** Xilinx only supports thermal measurements using SYSMON. Because it is integrated into the silicon, it gives the most accurate measurement possible ( $\pm 3^{\circ}\text{C}$ ). SYSMON is recommended to be utilized for all designs to monitor  $T_J$  and ensure the device junction temperature does not exceed the specification including the error. For example, if  $100^{\circ}\text{C}$  is the maximum  $T_J$  of the device, the maximum  $T_J$  measured using SYSMON should not be more than  $97^{\circ}\text{C}$  ( $100^{\circ}\text{C} - 3^{\circ}\text{C}$ ). The SYSMON error value can be found in device data sheets.

## Step 6: Set the Toggle and Connectivity Parameters

For each tab in XPE containing a Toggle Rate, Average Fanout, or Enable Rate, review the set value. For clock fanout, it is crucial that the fanout is entered correctly because it could dramatically impact the clock power. The clock fanout should be the sum of all the entries that are clocked by each clock in the other tabs.



**TIP:** To ensure the clock fanout is entered correctly, create an equation that sums all of the synchronous elements for any particular clock domain. For instance, in the Fanout field for a given clock, type =SUM (and then select all of the cells that specify the number of synchronous elements sourced by that clock (that is shift registers, distributed RAMs, block RAMs, DSPs, etc.) and close the parenthesis). The **Fanout** cell is then populated with the appropriate number. The resulting Excel equation would be similar to this:

=SUM(LOGIC!G12:I12, BRAM!E10, DSP!E8)

=SUM(IO!I19:K19)

---

This method of entering clock fanout has the added advantage of automatically updating when adjustments are made to the spreadsheet resource counts.

For toggle and enable rates, in the absence of any other information or knowledge, leave these settings at their defaults. However, if you determine that the default might not represent the characteristics of your design, make the necessary adjustments. For instance, if you know that a memory interface has a training pattern routine that exercises a sustained high-toggle rate on that interface, the **Toggle Rate** might need to be raised to reflect this additional activity. Alternatively, if a portion of a circuit is clock enabled in a way that reduces the overall activity of the circuit, the toggle rate might need to be reduced. More information on methods to determine toggle rate are found in *Xilinx Power Estimator User Guide* ([UG440](#)).

For I/O Output Load, enter a capacitive load for each design output. It affects the dynamic power of the driven output. The Output Load value is primarily made from the sum of the individual input capacitance of each device connected to that output. The input capacitance is described in the device data sheets.

## Step 7: Analyze the Results and Constrain the Design

Update Steps 1 through 6, if necessary, and after completing these steps, analyze the results.

Figure 15: Summary Panel



1. The Total On-Chip Power reported is the maximum power for the design and should not exceed the power budget. Analyze if this power is within the desired power and thermal budget for the project. If higher than the budget, adjustments should be made to the resource and power characteristics of the design until an acceptable result is reached.
2. Analyze the various trade-offs to derive the desired functionality with a tighter power budget. The best time to explore these options is early in the design process. After all the data is entered and the design is operating within the thermal limits of the selected device temperature grade, use the power reported by XPE to specify the rails and thermal design adjustments. Depending upon your confidence in the data entered, additional margin of values can help circumvent the possibility of under designing the system for the device selected.
3. Use the Total On-Chip Power value in thermal simulations to model the entire system including the device heat sink, board, other heat sources, case closure, and airflow patterns. Various uncertainties including the thermal model, SYSMON error, power estimation, heat sink, fan, TIM, and PCB irregularities need to be accounted for to build margin in the simulations and to calculate the maximum  $T_j$ . To provide a more accurate environment setting, derive an effective  $\theta_{JA}$  and local ambient values to feed back into the XPE spreadsheet. Ensure that the maximum  $T_j$  is not exceeded. Periodically update XPE while the design matures to check that the power and thermal margins are still adequate.

There are three factors that describe the thermal design:

- $T_j$ : Junction temperature.
- $T_A$ : Ambient temperature.
- $P_D$ : Power dissipation.

This is shown in the following equation:

$$\theta_{JA} = (T_J - T_A) / P_D$$

$\theta_{JA}$  represents this relationship in a Celsius per watt (C/W) value. For every watt dissipated, the junction temperature increases by a known value. This value can be reduced by improving the effectiveness of the thermal solution. For designs that exceed junction temperature, and changes to the thermal solution are not possible, one of these attributes must be changed:

- $T_j$ : If possible select a high temperature grade or utilize the excursion to 110°C. For more information, see *Extending the Thermal Solution by Utilizing Excursion Temperatures (WP517)*.

- $T_A$ : Can the ambient temperature of the product be reduced?
- $P_D$ : Reduce the power of the design. Reduce toggling, investigate clock gating, or move to a low-voltage part which can reduce power by up to 30%.

For devices available in lidless packages, this can reduce the  $\theta_{JA}$  because there is a lower thermal resistance to the user solution. This should be investigated, where possible.

4. After the power budget is defined, constrain the Vivado development using the power Xilinx design constraints (XDC). The XDC constraints are generated by selecting the XDC Constraints file type. In the dialog box, select **XDC Constraints** and specify a file name. Add the constraints to the XDC file of the project to allow the report\_power command to analyze the design and report the margin based on the power budget defined at the estimation stage. The goal is to keep the power in compliance even when the application is run on hardware, reducing any added cost or delay to the product.



5. The final step is to add the total power (W) required in the Total Power Budget dialog.



## Conclusion

Accurate power estimations are made using the Xilinx Power Estimator tool when accurate data is entered. During the early stages of system design, determining the exact power requirement can be a challenge. However, with the seven steps discussed in this application note, the issues are broken down into smaller, easier to define and understand phases that allow for improved data entry and improved data accuracy. After a power estimation has been finalized, it will directly impact all aspects of the product design, such as power delivery, board, thermal solution, and potentially mechanics. Ensuring that the original power estimation is adhered to is critical for a successful and fast time to market. Any deviation from the power estimation should be acted on as early as possible to reduce impact on the design cycle.

---

## References

These documents provide supplemental material useful with this guide:

1. UltraScale™ and UltraScale+™ device data sheets:
  - Kintex UltraScale FPGAs Data Sheet: DC and AC Switching Characteristics ([DS892](#))
  - Virtex UltraScale FPGAs Data Sheet: DC and AC Switching Characteristics ([DS893](#))
  - Kintex UltraScale+ FPGAs Data Sheet: DC and AC Switching Characteristics ([DS922](#))
  - Virtex UltraScale+ FPGA Data Sheet: DC and AC Switching Characteristics ([DS923](#))
  - Zynq UltraScale+ MPSoC Data Sheet: DC and AC Switching Characteristics ([DS925](#))
  - Zynq UltraScale+ RFSoC Data Sheet: DC and AC Switching Characteristics ([DS926](#))

---

## Revision History

The following table shows the revision history for this document.

| Section                | Revision Summary |
|------------------------|------------------|
| 11/16/2020 Version 1.0 |                  |
| Initial release.       | N/A              |

---

## Please Read: Important Legal Notices

The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2) Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or consequential loss or damage

(including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or to product specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior written consent. Certain products are subject to the terms and conditions of Xilinx's limited warranty, please refer to Xilinx's Terms of Sale which can be viewed at <https://www.xilinx.com/legal.htm#tos>; IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring fail-safe performance; you assume sole risk and liability for use of Xilinx products in such critical applications, please refer to Xilinx's Terms of Sale which can be viewed at <https://www.xilinx.com/legal.htm#tos>.

## AUTOMOTIVE APPLICATIONS DISCLAIMER

AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262 AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT LIABILITY.

## Copyright

© Copyright 2020 Xilinx, Inc. Xilinx, the Xilinx logo, Alveo, Artix, Kintex, Spartan, Versal, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.