

**Table 19-3. Non-Architectural Performance Events of the Processor Core Supported in Intel® Xeon® Processor Scalable Family with Skylake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic       | Description                                                                                                                                                                                                                                                                                                          | Comment                |
|------------|-------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|
| D4H        | 04H         | MEM_LOAD_MISC_RETIRED.UC  | Retired instructions with at least 1 uncacheable load or lock.                                                                                                                                                                                                                                                       | Precise event capable. |
| E6H        | 01H         | BACLEAR.SANY              | Counts the number of times the front-end is resteered when it finds a branch instruction in a fetch line. This occurs for the first time a branch instruction is fetched or when the branch is not tracked by the BPU (Branch Prediction Unit) anymore.                                                              |                        |
| F0H        | 40H         | L2_TRANS.L2_WB            | Counts L2 writebacks that access L2 cache.                                                                                                                                                                                                                                                                           |                        |
| F1H        | 1FH         | L2_LINES_IN.ALL           | Counts the number of L2 cache lines filling the L2. Counting does not cover rejects.                                                                                                                                                                                                                                 |                        |
| F2H        | 01H         | L2_LINES_OUT.SILENT       | Counts the number of lines that are silently dropped by L2 cache when triggered by an L2 cache fill. These lines are typically in Shared state. A non-threaded event.                                                                                                                                                |                        |
| F2H        | 02H         | L2_LINES_OUT.NON_SILENT   | Counts the number of lines that are evicted by L2 cache when triggered by an L2 cache fill. Those lines can be either in modified state or clean state. Modified lines may either be written back to L3 or directly written to memory and not allocated in L3. Clean lines may either be allocated in L3 or dropped. |                        |
| F2H        | 04H         | L2_LINES_OUT.USELESS_PREF | Counts the number of lines that have been hardware prefetched but not used and now evicted by L2 cache.                                                                                                                                                                                                              |                        |
| F2H        | 04H         | L2_LINES_OUT.USELESS_HWPF | Counts the number of lines that have been hardware prefetched but not used and now evicted by L2 cache.                                                                                                                                                                                                              |                        |
| F4H        | 10H         | SQ_MISC.SPLIT_LOCK        | Counts the number of cache line split locks sent to the uncore.                                                                                                                                                                                                                                                      |                        |
| FEH        | 02H         | IDI_MISC.WB_UPGRADE       | Counts number of cache lines that are allocated and written back to L3 with the intention that they are more likely to be reused shortly.                                                                                                                                                                            |                        |
| FEH        | 04H         | IDI_MISC.WB_DOWNGRADE     | Counts number of cache lines that are dropped and not written back to L3 as they are deemed to be less likely to be reused shortly.                                                                                                                                                                                  |                        |

## 19.3 PERFORMANCE MONITORING EVENTS FOR 6TH GENERATION INTEL® CORE™ PROCESSOR AND 7TH GENERATION INTEL® CORE™ PROCESSOR

6th Generation Intel® Core™ processors are based on the Skylake microarchitecture. They support the architectural performance-monitoring events listed in Table 19-1. Fixed counters in the core PMU support the architecture events defined in Table 19-2. Non-architectural performance-monitoring events in the processor core are listed in Table 19-4. The events in Table 19-4 apply to processors with CPUID signature of DisplayFamily\_DisplayModel encoding with the following values: 06\_4EH and 06\_5EH. Table 19-10 lists performance events supporting Intel TSX (see Section 18.3.6.5) and the events are applicable to processors based on Skylake microarchitecture. Where Skylake microarchitecture implements TSX-related event semantics that differ from Table 19-10, they are listed in Table 19-5.

7th Generation Intel® Core™ processors are based on the Kaby Lake microarchitecture. Non-architectural performance-monitoring events in the processor core are listed in Table 19-4. The events in Table 19-4 apply to processors with CPUID signature of DisplayFamily\_DisplayModel encoding with the following values: 06\_8EH and 06\_9EH.

## PERFORMANCE-MONITORING EVENTS

The comment column in Table 19-4 uses abbreviated letters to indicate additional conditions applicable to the Event Mask Mnemonic. For event umasks listed in Table 19-4 that do not show "AnyT", users should refrain from programming "AnyThread =1" in IA32\_PERF\_EVTSELx.

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture**

| Event Num. | Umask Value | Event Mask Mnemonic                 | Description                                                                                                                                                               | Comment    |
|------------|-------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| 03H        | 02H         | LD_BLOCKS.STORE_FORWARD             | Loads blocked by overlapping with store buffer that cannot be forwarded.                                                                                                  |            |
| 03H        | 08H         | LD_BLOCKS.NO_SR                     | The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.                                  |            |
| 07H        | 01H         | LD_BLOCKS_PARTIAL.ADDRESS_ALIAS     | False dependencies in MOB due to partial compare on address.                                                                                                              |            |
| 08H        | 01H         | DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK | Load misses in all TLB levels that cause a page walk of any page size.                                                                                                    |            |
| 08H        | 0EH         | DTLB_LOAD_MISSES.WALK_COMPLETED     | Load misses in all TLB levels causes a page walk that completes. (All page sizes.)                                                                                        |            |
| 08H        | 10H         | DTLB_LOAD_MISSES.WALK_PENDING       | Counts 1 per cycle for each PMH that is busy with a page walk for a load.                                                                                                 |            |
| 08H        | 10H         | DTLB_LOAD_MISSES.WALK_ACTIVE        | Cycles when at least one PMH is busy with a walk for a load.                                                                                                              | CMSK1      |
| 08H        | 20H         | DTLB_LOAD_MISSES.STLB_HIT           | Loads that miss the DTLB but hit STLB.                                                                                                                                    |            |
| ODH        | 01H         | INT_MISC.RECOVERY_CYCLES            | Core cycles the allocator was stalled due to recovery from earlier machine clear event for this thread (for example, misprediction or memory order conflict).             |            |
| ODH        | 01H         | INT_MISC.RECOVERY_CYCLES_ANY        | Core cycles the allocator was stalled due to recovery from earlier machine clear event for any logical thread in this processor core.                                     | AnyT       |
| ODH        | 80H         | INT_MISC.CLEAR_RESTEER_CYCLES       | Cycles the issue-stage is waiting for front end to fetch from resteed path following branch misprediction or machine clear events.                                        |            |
| 0EH        | 01H         | UOPS_ISSUED.ANY                     | The number of uops issued by the RAT to RS.                                                                                                                               |            |
| 0EH        | 01H         | UOPS_ISSUEDSTALL_CYCLES             | Cycles when the RAT does not issue uops to RS for the thread.                                                                                                             | CMSK1, INV |
| 0EH        | 02H         | UOPS_ISSUED.VECTOR_WIDTH_MISMATCH   | Uops inserted at issue-stage in order to preserve upper bits of vector registers.                                                                                         |            |
| 0EH        | 20H         | UOPS_ISSUED.SLOW_LEA                | Number of slow LEA or similar uops allocated. Such uop has 3 sources (for example, 2 sources + immediate) regardless of whether it is a result of LEA instruction or not. |            |
| 14H        | 01H         | ARITH.FPU_DIVIDER_ACTIVE            | Cycles when divider is busy executing divide or square root operations. Accounts for FP operations including integer divides.                                             |            |
| 24H        | 21H         | L2_RQSTS.DEMAND_DATA_RD_MISS        | Demand Data Read requests that missed L2, no rejects.                                                                                                                     |            |
| 24H        | 22H         | L2_RQSTS.RFO_MISS                   | RFO requests that missed L2.                                                                                                                                              |            |
| 24H        | 24H         | L2_RQSTS.CODE_RD_MISS               | L2 cache misses when fetching instructions.                                                                                                                               |            |
| 24H        | 27H         | L2_RQSTS.ALL_DEMAND_MISS            | Demand requests that missed L2.                                                                                                                                           |            |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                       | Description                                                                                                                                                                                                                                                                 | Comment         |
|------------|-------------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| 24H        | 38H         | L2_RQSTS.PF_MISS                          | Requests from the L1/L2/L3 hardware prefetchers or load software prefetches that miss L2 cache.                                                                                                                                                                             |                 |
| 24H        | 3FH         | L2_RQSTS.MISS                             | All requests that missed L2.                                                                                                                                                                                                                                                |                 |
| 24H        | 41H         | L2_RQSTS.DEMAND_DATA_RD_HIT               | Demand Data Read requests that hit L2 cache.                                                                                                                                                                                                                                |                 |
| 24H        | 42H         | L2_RQSTS.RFO_HIT                          | RFO requests that hit L2 cache.                                                                                                                                                                                                                                             |                 |
| 24H        | 44H         | L2_RQSTS.CODE_RD_HIT                      | L2 cache hits when fetching instructions.                                                                                                                                                                                                                                   |                 |
| 24H        | D8H         | L2_RQSTS.PF_HIT                           | Prefetches that hit L2.                                                                                                                                                                                                                                                     |                 |
| 24H        | E1H         | L2_RQSTS.ALL_DEMAND_DATA_RD               | All demand data read requests to L2.                                                                                                                                                                                                                                        |                 |
| 24H        | E2H         | L2_RQSTS.ALL_RFO                          | All L RFO requests to L2.                                                                                                                                                                                                                                                   |                 |
| 24H        | E4H         | L2_RQSTS.ALL_CODE_RD                      | All L2 code requests.                                                                                                                                                                                                                                                       |                 |
| 24H        | E7H         | L2_RQSTS.ALL_DEMAND_REFERENCES            | All demand requests to L2.                                                                                                                                                                                                                                                  |                 |
| 24H        | F8H         | L2_RQSTS.ALL_PF                           | All requests from the L1/L2/L3 hardware prefetchers or load software prefetches.                                                                                                                                                                                            |                 |
| 24H        | EFH         | L2_RQSTS.REFERENCES                       | All requests to L2.                                                                                                                                                                                                                                                         |                 |
| 2EH        | 4FH         | LONGEST_LAT_CACHE.REFERENCE               | This event counts requests originating from the core that reference a cache line in the L3 cache.                                                                                                                                                                           | See Table 19-1. |
| 2EH        | 41H         | LONGEST_LAT_CACHE.MISS                    | This event counts each cache miss condition for references to the L3 cache.                                                                                                                                                                                                 | See Table 19-1. |
| 3CH        | 00H         | CPU_CLK_UNHALTED.THREAD_P                 | Cycles while the logical processor is not in a halt state.                                                                                                                                                                                                                  | See Table 19-1. |
| 3CH        | 00H         | CPU_CLK_UNHALTED.THREAD_P_ANY             | Cycles while at least one logical processor is not in a halt state.                                                                                                                                                                                                         | AnyT            |
| 3CH        | 01H         | CPU_CLK_THREAD_UNHALTED.REF_XCLK          | Core crystal clock cycles when the thread is unhalted.                                                                                                                                                                                                                      | See Table 19-1. |
| 3CH        | 01H         | CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY      | Core crystal clock cycles when at least one thread on the physical core is unhalted.                                                                                                                                                                                        | AnyT            |
| 3CH        | 02H         | CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE | Core crystal clock cycles when this thread is unhalted and the other thread is halted.                                                                                                                                                                                      |                 |
| 48H        | 01H         | L1D_PEND_MISS.PENDING                     | Increments the number of outstanding L1D misses every cycle.                                                                                                                                                                                                                |                 |
| 48H        | 01H         | L1D_PEND_MISS.PENDING_CYCLES              | Cycles with at least one outstanding L1D misses from this logical processor.                                                                                                                                                                                                | CMSK1           |
| 48H        | 01H         | L1D_PEND_MISS.PENDING_CYCLES_ANY          | Cycles with at least one outstanding L1D misses from any logical processor in this core.                                                                                                                                                                                    | CMSK1, AnyT     |
| 48H        | 02H         | L1D_PEND_MISS.FB_FULL                     | Number of times a request needed a FB entry but there was no entry available for it. That is, the FB unavailability was the dominant reason for blocking the request. A request includes cacheable/uncacheable demand that is load, store or SW prefetch. HWP are excluded. |                 |
| 49H        | 01H         | DTLB_STORE_MISSES.MISS_CAUSES_A_WALK      | Store misses in all TLB levels that cause page walks.                                                                                                                                                                                                                       |                 |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                                     | Description                                                                                                                              | Comment    |
|------------|-------------|---------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------|
| 49H        | 0EH         | DTLB_STORE_MISSES.WALK_COMPLETED                        | Counts completed page walks in any TLB levels due to store misses (all page sizes).                                                      |            |
| 49H        | 10H         | DTLB_STORE_MISSES.WALK_PENDING                          | Counts 1 per cycle for each PMH that is busy with a page walk for a store.                                                               |            |
| 49H        | 10H         | DTLB_STORE_MISSES.WALK_ACTIVE                           | Cycles when at least one PMH is busy with a page walk for a store.                                                                       | CMSK1      |
| 49H        | 20H         | DTLB_STORE_MISSES.STLB_HIT                              | Store misses that missed DTLB but hit STLB.                                                                                              |            |
| 4CH        | 01H         | LOAD_HIT_PRE.HW_PF                                      | Demand load dispatches that hit fill buffer allocated for software prefetch.                                                             |            |
| 4FH        | 10H         | EPT.WALK_PENDING                                        | Counts 1 per cycle for each PMH that is busy with an EPT walk for any request type.                                                      |            |
| 51H        | 01H         | L1D.REPLACEMENT                                         | Counts the number of lines brought into the L1 data cache.                                                                               |            |
| 5EH        | 01H         | RS_EVENTS.EMPTY_CYCLES                                  | Cycles the RS is empty for the thread.                                                                                                   |            |
| 5EH        | 01H         | RS_EVENTS.EMPTY_END                                     | Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Front-end Latency Bound issues.  | CMSK1, INV |
| 60H        | 01H         | OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD             | Increment each cycle of the number of offcore outstanding Demand Data Read transactions in SQ to uncore.                                 |            |
| 60H        | 01H         | OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD | Cycles with at least one offcore outstanding Demand Data Read transactions in SQ to uncore.                                              | CMSK1      |
| 60H        | 01H         | OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD_GE_6        | Cycles with at least 6 offcore outstanding Demand Data Read transactions in SQ to uncore.                                                | CMSK6      |
| 60H        | 02H         | OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD             | Increment each cycle of the number of offcore outstanding demand code read transactions in SQ to uncore.                                 |            |
| 60H        | 02H         | OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD | Cycles with at least one offcore outstanding demand code read transactions in SQ to uncore.                                              | CMSK1      |
| 60H        | 04H         | OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO                 | Increment each cycle of the number of offcore outstanding RFO store transactions in SQ to uncore. Set Cmask=1 to count cycles.           |            |
| 60H        | 04H         | OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO     | Cycles with at least one offcore outstanding RFO transactions in SQ to uncore.                                                           | CMSK1      |
| 60H        | 08H         | OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD                | Increment each cycle of the number of offcore outstanding cacheable data read transactions in SQ to uncore. Set Cmask=1 to count cycles. |            |
| 60H        | 08H         | OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD        | Cycles with at least one offcore outstanding data read transactions in SQ to uncore.                                                     | CMSK1      |
| 60H        | 10H         | OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD     | Increment each cycle of the number of offcore outstanding demand data read requests from SQ that missed L3.                              |            |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                                             | Description                                                                                                       | Comment |
|------------|-------------|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|---------|
| 60H        | 10H         | OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEMAND_DATA_RD | Cycles with at least one offcore outstanding demand data read requests from SQ that missed L3.                    | CMSK1   |
| 60H        | 10H         | OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD_GE_6        | Cycles with at least one offcore outstanding demand data read requests from SQ that missed L3.                    | CMSK6   |
| 63H        | 02H         | LOCK_CYCLES.CACHE_LOCK_DURATION                                 | Cycles in which the L1D is locked.                                                                                |         |
| 79H        | 04H         | IDQ.MITE_UOPS                                                   | Increment each cycle # of uops delivered to IDQ from MITE path.                                                   |         |
| 79H        | 04H         | IDQ.MITE_CYCLES                                                 | Cycles when uops are being delivered to IDQ from MITE path.                                                       | CMSK1   |
| 79H        | 08H         | IDQ.DSB_UOPS                                                    | Increment each cycle. # of uops delivered to IDQ from DSB path.                                                   |         |
| 79H        | 08H         | IDQ.DSB_CYCLES                                                  | Cycles when uops are being delivered to IDQ from DSB path.                                                        | CMSK1   |
| 79H        | 10H         | IDQ.MS_DSB_UOPS                                                 | Increment each cycle # of uops delivered to IDQ by DSB when MS_busy.                                              |         |
| 79H        | 18H         | IDQ.ALL_DSB_CYCLES_ANY_UOPS                                     | Cycles DSB is delivered at least one uops.                                                                        | CMSK1   |
| 79H        | 18H         | IDQ.ALL_DSB_CYCLES_4_UOPS                                       | Cycles DSB is delivered four uops.                                                                                | CMSK4   |
| 79H        | 20H         | IDQ.MS_MITE_UOPS                                                | Increment each cycle # of uops delivered to IDQ by MITE when MS_busy.                                             |         |
| 79H        | 24H         | IDQ.ALL_MITE_CYCLES_ANY_UOPS                                    | Counts cycles MITE is delivered at least one uops.                                                                | CMSK1   |
| 79H        | 24H         | IDQ.ALL_MITE_CYCLES_4_UOPS                                      | Counts cycles MITE is delivered four uops.                                                                        | CMSK4   |
| 79H        | 30H         | IDQ.MS_UOPS                                                     | Increment each cycle # of uops delivered to IDQ while MS is busy.                                                 |         |
| 79H        | 30H         | IDQ.MS_SWITCHES                                                 | Number of switches from DSB or MITE to MS.                                                                        | EDG     |
| 79H        | 30H         | IDQ.MS_CYCLES                                                   | Cycles MS is delivered at least one uops.                                                                         | CMSK1   |
| 80H        | 04H         | ICACHE_16B.IFDATA_STALL                                         | Cycles where a code fetch is stalled due to L1 instruction cache miss.                                            |         |
| 80H        | 04H         | ICACHE_64B.IFDATA_STALL                                         | Cycles where a code fetch is stalled due to L1 instruction cache tag miss.                                        |         |
| 83H        | 01H         | ICACHE_64B.IFTAG_HIT                                            | Instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity.  |         |
| 83H        | 02H         | ICACHE_64B.IFTAG_MISS                                           | Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity. |         |
| 85H        | 01H         | ITLB_MISSES.MISS_CAUSES_A_WALK                                  | Misses at all ITLB levels that cause page walks.                                                                  |         |
| 85H        | 0EH         | ITLB_MISSES.WALK_COMPLETE_D                                     | Counts completed page walks in any TLB level due to code fetch misses (all page sizes).                           |         |
| 85H        | 10H         | ITLB_MISSES.WALK_PENDING                                        | Counts 1 per cycle for each PMH that is busy with a page walk for an instruction fetch request.                   |         |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                               | Description                                                                                                                      | Comment                     |
|------------|-------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
| 85H        | 20H         | ITLB_MISSES.STLB_HIT                              | ITLB misses that hit STLB.                                                                                                       |                             |
| 87H        | 01H         | ILD_STALL.LCP                                     | Stalls caused by changing prefix length of the instruction.                                                                      |                             |
| 9CH        | 01H         | IDQ_UOPS_NOT_DELIVERED.CORE                       | Count issue pipeline slots where no uop was delivered from the front end to the back end when there is no back-end stall.        |                             |
| 9CH        | 01H         | IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOP_DELIV.CORE    | Cycles which 4 issue pipeline slots had no uop delivered from the front end to the back end when there is no back-end stall.     | CMSK4                       |
| 9CH        | 01H         | IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_n_UOP_DELIV.CORE | Cycles which "4-n" issue pipeline slots had no uop delivered from the front end to the back end when there is no back-end stall. | Set CMSK = 4-n; n = 1, 2, 3 |
| 9CH        | 01H         | IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK           | Cycles which front end delivered 4 uops or the RAT was stalling FE.                                                              | CMSK, INV                   |
| A1H        | 01H         | UOPS_DISPATCHED_PORT.PORT_0                       | Counts the number of cycles in which a uop is dispatched to port 0.                                                              |                             |
| A1H        | 02H         | UOPS_DISPATCHED_PORT.PORT_1                       | Counts the number of cycles in which a uop is dispatched to port 1.                                                              |                             |
| A1H        | 04H         | UOPS_DISPATCHED_PORT.PORT_2                       | Counts the number of cycles in which a uop is dispatched to port 2.                                                              |                             |
| A1H        | 08H         | UOPS_DISPATCHED_PORT.PORT_3                       | Counts the number of cycles in which a uop is dispatched to port 3.                                                              |                             |
| A1H        | 10H         | UOPS_DISPATCHED_PORT.PORT_4                       | Counts the number of cycles in which a uop is dispatched to port 4.                                                              |                             |
| A1H        | 20H         | UOPS_DISPATCHED_PORT.PORT_5                       | Counts the number of cycles in which a uop is dispatched to port 5.                                                              |                             |
| A1H        | 40H         | UOPS_DISPATCHED_PORT.PORT_6                       | Counts the number of cycles in which a uop is dispatched to port 6.                                                              |                             |
| A1H        | 80H         | UOPS_DISPATCHED_PORT.PORT_7                       | Counts the number of cycles in which a uop is dispatched to port 7.                                                              |                             |
| A2H        | 01H         | RESOURCE_STALLS.ANY                               | Resource-related stall cycles.                                                                                                   |                             |
| A2H        | 08H         | RESOURCE_STALLS.SB                                | Cycles stalled due to no store buffers available (not including draining form sync).                                             |                             |
| A3H        | 01H         | CYCLE_ACTIVITY.CYCLES_L2_MISS                     | Cycles while L2 cache miss demand load is outstanding.                                                                           | CMSK1                       |
| A3H        | 02H         | CYCLE_ACTIVITY.CYCLES_L3_MISS                     | Cycles while L3 cache miss demand load is outstanding.                                                                           | CMSK2                       |
| A3H        | 04H         | CYCLE_ACTIVITY.STALLS_TOTAL                       | Total execution stalls.                                                                                                          | CMSK4                       |
| A3H        | 05H         | CYCLE_ACTIVITY.STALLS_L2_MISS                     | Execution stalls while L2 cache miss demand load is outstanding.                                                                 | CMSK5                       |
| A3H        | 06H         | CYCLE_ACTIVITY.STALLS_L3_MISS                     | Execution stalls while L3 cache miss demand load is outstanding.                                                                 | CMSK6                       |
| A3H        | 08H         | CYCLE_ACTIVITY.CYCLES_L1D_MISS                    | Cycles while L1 data cache miss demand load is outstanding.                                                                      | CMSK8                       |
| A3H        | 0CH         | CYCLE_ACTIVITY.STALLS_L1D_MISS                    | Execution stalls while L1 data cache miss demand load is outstanding.                                                            | CMSK12                      |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                     | Description                                                                                                                                   | Comment   |
|------------|-------------|-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| A3H        | 10H         | CYCLE_ACTIVITY.CYCLES_MEM_ANY           | Cycles while memory subsystem has an outstanding load.                                                                                        | CMSK16    |
| A3H        | 14H         | CYCLE_ACTIVITY.STALLS_MEM_ANY           | Execution stalls while memory subsystem has an outstanding load.                                                                              | CMSK20    |
| A6H        | 01H         | EXE_ACTIVITY.EXE_BOUND_0_PORTS          | Cycles for which no uops began execution, the Reservation Station was not empty, the Store Buffer was full and there was no outstanding load. |           |
| A6H        | 02H         | EXE_ACTIVITY.1_PORTS_UTIL               | Cycles for which one uop began execution on any port, and the Reservation Station was not empty.                                              |           |
| A6H        | 04H         | EXE_ACTIVITY.2_PORTS_UTIL               | Cycles for which two uops began execution, and the Reservation Station was not empty.                                                         |           |
| A6H        | 08H         | EXE_ACTIVITY.3_PORTS_UTIL               | Cycles for which three uops began execution, and the Reservation Station was not empty.                                                       |           |
| A6H        | 04H         | EXE_ACTIVITY.4_PORTS_UTIL               | Cycles for which four uops began execution, and the Reservation Station was not empty.                                                        |           |
| A6H        | 40H         | EXE_ACTIVITY.BOUND_ON_STORES            | Cycles where the Store Buffer was full and no outstanding load.                                                                               |           |
| A8H        | 01H         | LSD.UOPS                                | Number of uops delivered by the LSD.                                                                                                          |           |
| A8H        | 01H         | LSD.CYCLES_ACTIVE                       | Cycles with at least one uop delivered by the LSD and none from the decoder.                                                                  | CMSK1     |
| A8H        | 01H         | LSD.CYCLES_4_UOPS                       | Cycles with 4 uops delivered by the LSD and none from the decoder.                                                                            | CMSK4     |
| ABH        | 02H         | DSB2MITE_SWITCHES.PENALTY_CYCLES        | DSB-to-MITE switch true penalty cycles.                                                                                                       |           |
| AEH        | 01H         | ITLB.ITLB_FLUSH                         | Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages.                                                                        |           |
| B0H        | 01H         | OFFCORE_REQUESTS.DEMAND_DATA_RD         | Demand data read requests sent to uncore.                                                                                                     |           |
| B0H        | 02H         | OFFCORE_REQUESTS.DEMAND_CODE_RD         | Demand code read requests sent to uncore.                                                                                                     |           |
| B0H        | 04H         | OFFCORE_REQUESTS.DEMAND_RFO             | Demand RFO read requests sent to uncore, including regular RFOs, locks, ItoM.                                                                 |           |
| B0H        | 08H         | OFFCORE_REQUESTS.ALL_DATA_RD            | Data read requests sent to uncore (demand and prefetch).                                                                                      |           |
| B0H        | 10H         | OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD | Demand data read requests that missed L3.                                                                                                     |           |
| B0H        | 80H         | OFFCORE_REQUESTS.ALL_REQUESTS           | Any memory transaction that reached the SQ.                                                                                                   |           |
| B1H        | 01H         | UOPS_EXECUTED.THREAD                    | Counts the number of uops that begin execution across all ports.                                                                              |           |
| B1H        | 01H         | UOPS_EXECUTEDSTALL_CYCLES               | Cycles where there were no uops that began execution.                                                                                         | CMSK, INV |
| B1H        | 01H         | UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC      | Cycles where there was at least one uop that began execution.                                                                                 | CMSK1     |
| B1H        | 01H         | UOPS_EXECUTED.CYCLES_GE_2_UOP_EXEC      | Cycles where there were at least two uops that began execution.                                                                               | CMSK2     |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                | Description                                                                                                                                 | Comment            |
|------------|-------------|------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| B1H        | 01H         | UOPS_EXECUTED.CYCLES_GE_3_UOP_EXEC | Cycles where there were at least three uops that began execution.                                                                           | CMSK3              |
| B1H        | 01H         | UOPS_EXECUTED.CYCLES_GE_4_UOP_EXEC | Cycles where there were at least four uops that began execution.                                                                            | CMSK4              |
| B1H        | 02H         | UOPS_EXECUTED.CORE                 | Counts the number of uops from any logical processor in this core that begin execution.                                                     |                    |
| B1H        | 02H         | UOPS_EXECUTED.CORE_CYCLES_GE_1     | Cycles where there was at least one uop, from any logical processor in this core, that began execution.                                     | CMSK1              |
| B1H        | 02H         | UOPS_EXECUTED.CORE_CYCLES_GE_2     | Cycles where there were at least two uops, from any logical processor in this core, that began execution.                                   | CMSK2              |
| B1H        | 02H         | UOPS_EXECUTED.CORE_CYCLES_GE_3     | Cycles where there were at least three uops, from any logical processor in this core, that began execution.                                 | CMSK3              |
| B1H        | 02H         | UOPS_EXECUTED.CORE_CYCLES_GE_4     | Cycles where there were at least four uops, from any logical processor in this core, that began execution.                                  | CMSK4              |
| B1H        | 02H         | UOPS_EXECUTED.CORE_CYCLES_NONE     | Cycles where there were no uops from any logical processor in this core that began execution.                                               | CMSK1, INV         |
| B1H        | 10H         | UOPS_EXECUTED.X87                  | Counts the number of X87 uops that begin execution.                                                                                         |                    |
| B2H        | 01H         | OFF_CORE_REQUEST_BUFFER.S_Q_FULL   | Offcore requests buffer cannot take more entries for this core.                                                                             |                    |
| B7H        | 01H         | OFF_CORE_RESPONSE_0                | See Section 18.3.4.5, "Off-core Response Performance Monitoring".                                                                           | Requires MSR 01A6H |
| BBH        | 01H         | OFF_CORE_RESPONSE_1                | See Section 18.3.4.5, "Off-core Response Performance Monitoring".                                                                           | Requires MSR 01A7H |
| BDH        | 01H         | TLB_FLUSH.DTLB_THREAD              | DTLB flush attempts of the thread-specific entries.                                                                                         |                    |
| BDH        | 01H         | TLB_FLUSH.STLB_ANY                 | STLB flush attempts.                                                                                                                        |                    |
| COH        | 00H         | INST_RETIRED.ANY_P                 | Number of instructions at retirement.                                                                                                       | See Table 19-1.    |
| COH        | 01H         | INST_RETIRED.PREC_DIST             | Precise instruction retired event with Hw to reduce effect of PEBS shadow in IP distribution.                                               | PMC1 only;         |
| COH        | 01H         | INST_RETIRED.TOTAL_CYCLES          | Number of cycles using always true condition applied to PEBS instructions retired event.                                                    | CMSK10, PS         |
| C1H        | 3FH         | OTHER_ASSISTS.ANY                  | Number of times a microcode assist is invoked by Hw other than FP-assist. Examples include AD (page Access Dirty) and AVX* related assists. |                    |
| C2H        | 01H         | UOPS_RETIRE_STALL_CYCLES           | Cycles without actually retired uops.                                                                                                       | CMSK1, INV         |
| C2H        | 01H         | UOPS_RETIRE_TOTAL_CYCLES           | Cycles with less than 10 actually retired uops.                                                                                             | CMSK10, INV        |
| C2H        | 02H         | UOPS_RETIRE.RETIRE_SLOTS           | Retirement slots used.                                                                                                                      |                    |
| C3H        | 01H         | MACHINE_CLEAR.COUNT                | Number of machine clears of any type.                                                                                                       | CMSK1, EDG         |
| C3H        | 02H         | MACHINE_CLEAR.MEMORY_OR_DERING     | Counts the number of machine clears due to memory order conflicts.                                                                          |                    |
| C3H        | 04H         | MACHINE_CLEAR.SMC                  | Number of self-modifying-code machine clears detected.                                                                                      |                    |
| C4H        | 00H         | BR_INST_RETIRE_ALL_BRANCHES        | Branch instructions that retired.                                                                                                           | See Table 19-1.    |
| C4H        | 01H         | BR_INST_RETIRE_CONDITIONAL         | Counts the number of conditional branch instructions retired.                                                                               | PS                 |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                     | Description                                                                                                                                                                                                                                        | Comment                                       |
|------------|-------------|-----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| C4H        | 02H         | BR_INST_RETIRENEAR_CALL                 | Direct and indirect near call instructions retired.                                                                                                                                                                                                | PS                                            |
| C4H        | 04H         | BR_INST_RETIRENEAR_BRANCHES             | Counts the number of branch instructions retired.                                                                                                                                                                                                  | PS                                            |
| C4H        | 08H         | BR_INST_RETIRENEAR_RETURN               | Counts the number of near return instructions retired.                                                                                                                                                                                             | PS                                            |
| C4H        | 10H         | BR_INST_RETIRENOT_TAKEN                 | Counts the number of not taken branch instructions retired.                                                                                                                                                                                        |                                               |
| C4H        | 20H         | BR_INST_RETIRENEARTAKEN                 | Number of near taken branches retired.                                                                                                                                                                                                             | PS                                            |
| C4H        | 40H         | BR_INST_RETIREFAR_BRANCHES              | Number of far branches retired.                                                                                                                                                                                                                    | PS                                            |
| C5H        | 00H         | BR_MISP_RETIRENEAR_BRANCHES             | Mispredicted branch instructions at retirement.                                                                                                                                                                                                    | See Table 19-1.                               |
| C5H        | 01H         | BR_MISP_RETIRECONDITIONAL               | Mispredicted conditional branch instructions retired.                                                                                                                                                                                              | PS                                            |
| C5H        | 04H         | BR_MISP_RETIRENEAR_BRANCHES             | Mispredicted macro branch instructions retired.                                                                                                                                                                                                    | PS                                            |
| C5H        | 20H         | BR_MISP_RETIRENEARTAKEN                 | Number of near branch instructions retired that were mispredicted and taken.                                                                                                                                                                       | PS                                            |
| C6H        | 01H         | FRONTEND_RETIREDSB_MISS                 | Retired instructions which experienced DSB miss. Specify MSR_PEBS_FRONTEND.EVTSEL=11H.                                                                                                                                                             | PS                                            |
| C6H        | 01H         | FRONTEND_RETIREDL1I_MISS                | Retired instructions which experienced instruction L1 cache true miss. Specify MSR_PEBS_FRONTEND.EVTSEL=12H.                                                                                                                                       | PS                                            |
| C6H        | 01H         | FRONTEND_RETIREDL2_MISS                 | Retired instructions which experienced L2 cache true miss. Specify MSR_PEBS_FRONTEND.EVTSEL=13H.                                                                                                                                                   | PS                                            |
| C6H        | 01H         | FRONTEND_RETIREDLTB_MISS                | Retired instructions which experienced ITLB true miss. Specify MSR_PEBS_FRONTEND.EVTSEL=14H.                                                                                                                                                       | PS                                            |
| C6H        | 01H         | FRONTEND_RETIRESTLB_MISS                | Retired instructions which experienced STLB true miss. Specify MSR_PEBS_FRONTEND.EVTSEL=15H.                                                                                                                                                       | PS                                            |
| C6H        | 01H         | FRONTEND_RETIRELATENCYGE_16             | Retired instructions that are fetched after an interval where the front end delivered no uops for at least 16 cycles. Specify the following fields in MSR_PEBS_FRONTEND: EVTSEL=16H, IDQ_Bubble_Length =16, IDQ_Bubble_Width = 4.                  | PS                                            |
| C6H        | 01H         | FRONTEND_RETIRELATENCYGE_2_BUBBLES_GE_m | Retired instructions that are fetched after an interval where the front end had 'm' IDQ slots delivered, no uops for at least 2 cycles. Specify the following fields in MSR_PEBS_FRONTEND: EVTSEL=16H, IDQ_Bubble_Length =2, IDQ_Bubble_Width = m. | PS, m = 1, 2, 3                               |
| C7H        | 01H         | FP_ARITH_INST_RETIRESCALAR_DOUBLE       | Number of double-precision, floating-point, scalar SSE/AVX computational instructions that are retired. Each scalar FMA instruction counts as 2.                                                                                                   | Software may treat each count as one DP FLOP. |
| C7H        | 02H         | FP_ARITH_INST_RETIRESCALAR_SINGLE       | Number of single-precision, floating-point, scalar SSE/AVX computational instructions that are retired. Each scalar FMA instruction counts as 2.                                                                                                   | Software may treat each count as one SP FLOP. |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num. | Umask Value | Event Mask Mnemonic                      | Description                                                                                                                                                   | Comment                                          |
|------------|-------------|------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|
| C7H        | 04H         | FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE | Number of double-precision, floating-point, 128-bit SSE/AVX computational instructions that are retired. Each 128-bit FMA or (V)DPPD instruction counts as 2. | Software may treat each count as two DP FLOPs.   |
| C7H        | 08H         | FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE | Number of single-precision, floating-point, 128-bit SSE/AVX computational instructions that are retired. Each 128-bit FMA or (V)DPPS instruction counts as 2. | Software may treat each count as four SP FLOPs.  |
| C7H        | 10H         | FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE | Number of double-precision, floating-point, 256-bit SSE/AVX computational instructions that are retired. Each 256-bit FMA instruction counts as 2.            | Software may treat each count as four DP FLOPs.  |
| C7H        | 20H         | FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE | Number of single-precision, floating-point, 256-bit SSE/AVX computational instructions that are retired. Each 256-bit FMA or VDPPS instruction counts as 2.   | Software may treat each count as eight SP FLOPs. |
| CAH        | 1EH         | FP_ASSIST.ANY                            | Cycles with any input/output SSE* or FP assists.                                                                                                              | CMSK1                                            |
| CBH        | 01H         | HW_INTERRUPTS.RECEIVED                   | Number of hardware interrupts received by the processor.                                                                                                      |                                                  |
| CDH        | 01H         | MEM_TRANS_RETIRED.LOAD_LATENCY           | Randomly sampled loads whose latency is above a user defined threshold. A small fraction of the overall loads are sampled due to randomization.               | Specify threshold in MSR 3F6H.<br>PSDLA          |
| D0H        | 11H         | MEM_INST_RETIRED.STLB_MISS_LOADS         | Retired load instructions that miss the STLB.                                                                                                                 | PSDLA                                            |
| D0H        | 12H         | MEM_INST_RETIRED.STLB_MISS_STORES        | Retired store instructions that miss the STLB.                                                                                                                | PSDLA                                            |
| D0H        | 21H         | MEM_INST_RETIRED.LOCK_LOADS              | Retired load instructions with locked access.                                                                                                                 | PSDLA                                            |
| D0H        | 41H         | MEM_INST_RETIRED.SPLIT_LOADS             | Number of load instructions retired with cache-line splits that may impact performance.                                                                       | PSDLA                                            |
| D0H        | 42H         | MEM_INST_RETIRED.SPLIT_STORES            | Number of store instructions retired with line-split.                                                                                                         | PSDLA                                            |
| D0H        | 81H         | MEM_INST_RETIRED.ALL_LOADS               | All retired load instructions.                                                                                                                                | PSDLA                                            |
| D0H        | 82H         | MEM_INST_RETIRED.ALL_STORES              | All retired store instructions.                                                                                                                               | PSDLA                                            |
| D1H        | 01H         | MEM_LOAD_RETIRED.L1_HIT                  | Retired load instructions with L1 cache hits as data sources.                                                                                                 | PSDLA                                            |
| D1H        | 02H         | MEM_LOAD_RETIRED.L2_HIT                  | Retired load instructions with L2 cache hits as data sources.                                                                                                 | PSDLA                                            |
| D1H        | 04H         | MEM_LOAD_RETIRED.L3_HIT                  | Retired load instructions with L3 cache hits as data sources.                                                                                                 | PSDLA                                            |
| D1H        | 08H         | MEM_LOAD_RETIRED.L1_MISS                 | Retired load instructions missed L1 cache as data sources.                                                                                                    | PSDLA                                            |
| D1H        | 10H         | MEM_LOAD_RETIRED.L2_MISS                 | Retired load instructions missed L2. Unknown data source excluded.                                                                                            | PSDLA                                            |
| D1H        | 20H         | MEM_LOAD_RETIRED.L3_MISS                 | Retired load instructions missed L3. Excludes unknown data source.                                                                                            | PSDLA                                            |

**Table 19-4. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture and Kaby Lake Microarchitecture (Contd.)**

| Event Num.                                                                                                                                                                                                                                                                                                                                                                                                                                   | Umask Value | Event Mask Mnemonic              | Description                                                                                                                                        | Comment |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| D1H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 40H         | MEM_LOAD_RETIREDFB_HIT           | Retired load instructions where data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. | PSDLA   |
| D2H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 01H         | MEM_LOAD_L3_HIT_RETIREDXSNP_MISS | Retired load instructions where data sources were L3 hit and cross-core snoop missed in on-pkg core cache.                                         | PSDLA   |
| D2H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 02H         | MEM_LOAD_L3_HIT_RETIREDXSNP_HIT  | Retired load Instructions where data sources were L3 and cross-core snoop hits in on-pkg core cache.                                               | PSDLA   |
| D2H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 04H         | MEM_LOAD_L3_HIT_RETIREDXSNP_HITM | Retired load instructions where data sources were HitM responses from shared L3.                                                                   | PSDLA   |
| D2H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 08H         | MEM_LOAD_L3_HIT_RETIREDXSNP_NONE | Retired load instructions where data sources were hits in L3 without snoops required.                                                              | PSDLA   |
| E6H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 01H         | BACLEARSA                        | Number of front end re-steers due to BPU misprediction.                                                                                            |         |
| F0H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 40H         | L2_TRANS.L2_WB                   | L2 writebacks that access L2 cache.                                                                                                                |         |
| F1H                                                                                                                                                                                                                                                                                                                                                                                                                                          | 07H         | L2_LINES_IN.ALL                  | L2 cache lines filling L2.                                                                                                                         |         |
| CMSK1: Counter Mask = 1 required; CMSK4: CounterMask = 4 required; CMSK6: CounterMask = 6 required; CMSK8: CounterMask = 8 required; CMSK10: CounterMask = 10 required; CMSK12: CounterMask = 12 required; CMSK16: CounterMask = 16 required; CMSK20: CounterMask = 20 required.<br>AnyT: AnyThread = 1 required.<br>INV: Invert = 1 required.<br>EDG: EDGE = 1 required.<br>PSDL: Also supports PEBS and DataLA.<br>PS: Also supports PEBS. |             |                                  |                                                                                                                                                    |         |

Table 19-10 lists performance events supporting Intel TSX (see Section 18.3.6.5) and the events are applicable to processors based on Skylake microarchitecture. Where Skylake microarchitecture implements TSX-related event semantics that differ from Table 19-10, they are listed in Table 19-5.

**Table 19-5. Intel® TSX Performance Event Addendum in Processors based on Skylake Microarchitecture**

| Event Num. | Umask Value | Event Mask Mnemonic   | Description                                                                                                             | Comment |
|------------|-------------|-----------------------|-------------------------------------------------------------------------------------------------------------------------|---------|
| 54H        | 02H         | TX_MEM.ABORT_CAPACITY | Number of times a transactional abort was signaled due to a data capacity limitation for transactional reads or writes. |         |

## 19.4 PERFORMANCE MONITORING EVENTS FOR INTEL® XEON PHI™ PROCESSOR 3200, 5200, 7200 SERIES

Intel® Xeon Phi™ processors 3200/5200/7200 series are based on the Knights Landing microarchitecture. Non-architectural performance-monitoring events in the processor core are listed in Table 19-6. The events in Table 19-6 apply to processors with CPUID signature of DisplayFamily\_DisplayModel encoding with the following value 06\_57H.