|
Unit |
Event Number. |
Mnemonic Event Name |
Unit Mask |
Description |
Comments |
|
Data Cache Unit (DCU) |
43H |
PP_DATA_MEM_REFS |
00H |
All loads from any memory type. All stores to any memory type. Each part of a split is counted separately. Note: 80 bit floating point accesses are double counted, since they are decomposed into a 16 bit exponent load and a 64 bit mantissa load. |
|
|
|
45H |
PP_DCU_LINES_IN |
00H |
Total number of lines that have been allocated in the DCU. |
|
|
|
46H |
PP_DCU_M_LINES_IN |
00H |
Number of Modified state lines that have been allocated in the DCU. |
|
|
|
47H |
PP_DCU_M_LINES_OUT |
00H |
Number of Modified state lines that have been evicted from the DCU. This includes evictions as a result of external snoops, internal intervention or the natural replacement algorithm. |
|
|
|
48H |
PP_DCU_MISS_OUTSTANDING |
00H |
Weighted number of cycles while a DCU miss is outstanding. Incremented by the number of outstanding cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded. Read-for-ownerships are counted as well as line fills, invalidates, stores. |
An access that also misses the L2 is short-changed by 2 cycles. (i.e. if count is N cycles, should be N+2 cycles.) Subsequent loads to the same cache line will not result in any additional counts.Count value not precise, but still useful. |
|
Instruction Fetch Unit (IFU) |
80H |
PP_IFU_IFETCH |
00H |
Number of instruction fetches, both cacheable and non-cacheable. Including UC fetches. |
Will be incremented by 1 for each cacheable line fetched and by 1 for each uncached instruction fetched |
|
|
81H |
PP_IFU_IFETCH_MISS |
00H |
Number of instruction fetch misses. All instruction fetches that do not hit the IFU i.e. that produce memory requests. Includes UC accesses. |
|
|
|
85H |
PP_ITLB_MISS |
00H |
Number of ITLB misses. |
|
|
|
86H |
PP_IFU_MEM_STALL |
00H |
Number of cycles instruction fetch is stalled, for any reason. Includes IFU cache misses, ITLB misses, ITLB faults and and other minor stalls. |
|
|
|
87H |
PP_ILD_STALL |
00H |
Number of cycles that the instruction length decoder stage of the processors pipeline is stalled. |
|
|
L2 Cache |
28H |
PP_L2_IFETCH |
MESI 0FH |
Number of L2 instruction fetches. This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches; it does not include UC instruction fetches. It does not include ITLB miss accesses. |
|
|
|
29H |
PP_L2_LD |
MESI 0FH |
Number of L2 data loads. This event indicates that a normal, unlocked, load memory access was received by the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses. |
|
|
|
2AH |
PP_L2_ST |
MESI 0FH |
Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. Specifically, it indicates that the DCU sent a read-for-ownership request to the L2. It also includes Invalid to Modified requests sent by the DCU to the L2. It includes only L2 cacheable store memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses like UC/WT stores. It includes TLB miss memory accesses. |
|
|
|
24H |
PP_L2_LINES_IN |
00H |
Number of lines allocated in the L2. |
|
|
|
26H |
PP_L2_LINES_OUT |
00H |
Number of lines removed from the L2 for any reason. |
|
|
|
25H |
PP_L2_M_LINES_INM |
00H |
Number of Modified state lines allocated in the L2. |
|
|
|
27H |
PP_L2_M_LINES_OUTM |
00H |
Number of Modified state lines removed from the L2 for any reason. |
|
|
|
2EH |
PP_L2_RQSTS |
MESI 0FH |
Total number of all L2 requests. |
|
|
|
21H |
PP_L2_ADS |
00H |
Number of L2 address strobes. |
|
|
|
22H |
PP_L2_DBUS_BUSY |
00H |
Number of cycles during which the L2 cache data bus was busy. |
|
|
|
23H |
PP_L2_DBUS_BUSY_RD |
00H |
Number of cycles during which the data bus was busy transferring read data from L2 to the processor. |
|
|
External Bus Logic (EBL)2 |
62H |
PP_BUS_DRDY_CLOCKS |
00H (Self) 20H (Any) |
Number of clocks during which DRDY# is asserted. Essentially, utilization of the external system data bus |
Unit Mask = 00H counts bus clocks when the processor is driving DRDY.Unit Mask = 20H counts in processor clocks when any agent is driving DRDY. |
|
|
63H |
PP_BUS_LOCK_CLOCKS |
00H (Self) 20H (Any) |
Number of clocks during which LOCK# is asserted on the external system bus. |
Always counts in processor clocks |
|
|
60H |
PP_BUS_REQ_OUTSTANDING |
00H (Self) |
Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle |
Counts only DCU full-line cacheable reads, not RFOs, writes, instruction fetches, or anything else. Counts "waiting for bus to complete" (last data chunk received). |
|
|
65H |
PP_BUS_TRAN_BRD |
00H (Self) 20H (Any) |
Number of bus burst read transactions. |
|
|
|
66H |
PP_BUS_TRAN_RFO |
00H (Self) 20H (Any) |
Number of completed bus read for ownership transactions. |
|
|
|
67H |
PP_BUS_TRANS_WB |
00H (Self) 20H (Any) |
Number of completed bus write back transactions. |
|
|
|
68H |
PP_BUS_TRAN_IFETCH |
00H (Self) 20H (Any) |
Number of completed bus instruction fetch transactions. |
|
|
|
69H |
PP_BUS_TRAN_INVAL |
00H (Self) 20H (Any) |
Number of completed bus invalidate transactions. |
|
|
|
6AH |
PP_BUS_TRAN_PWR |
00H (Self) 20H (Any) |
Number of completed bus partial write transactions. |
|
|
|
6BH |
PP_BUS_TRANS_P |
00H (Self) 20H (Any) |
Number of completed bus partial transactions. |
|
|
|
6CH |
PP_BUS_TRANS_IO |
00H (Self) 20H (Any) |
Number of completed bus I/O transactions. |
|
|
|
6DH |
PP_BUS_TRAN_DEF |
00H (Self) 20H (Any) |
Number of completed bus deferred transactions. |
|
|
|
6EH |
PP_BUS_TRAN_BURST |
00H (Self) 20H (Any) |
Number of completed bus burst transactions. |
|
|
|
70H |
PP_BUS_TRAN_ANY |
00H (Self) 20H (Any) |
Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles etc. |
|
|
|
6FH |
PP_BUS_TRAN_MEM |
00H (Self) 20H (Any) |
Number of completed memory transactions. |
|
|
|
64H |
PP_BUS_DATA_RCV |
00H (Self) |
Number of bus clock cycles during which this processor is receiving data. |
|
|
|
61H |
PP_BUS_BNR_DRV |
00H (Self) |
Number of bus clock cycles during which this processor is driving the BNR pin. |
|
|
|
7AH |
PP_BUS_HIT_DRV |
00H (Self) |
Number of bus clock cycles during which this processor is driving the HIT pin. |
Includes cycles due to snoop stalls. |
|
|
7BH |
PP_BUS_HITM_DRV |
00H (Self) |
Number of bus clock cycles during which this processor is driving the HITM pin. |
Includes cycles due to snoop stalls. |
|
|
7EH |
PP_BUS_SNOOP_STALL |
00H (Self) |
Number of clock cycles during which the bus is snoop stalled. |
|
|
Floating Point Unit |
C1H |
PP_FLOPS |
00H |
Number of computational floating-point operations retired. Excludes floating point computational operations that cause traps or assists. Includes floating point computational operations executed by the assist handler. |
Counter 0 only |
|
|
10H |
PP_FP_COMP_OPS_EXE |
00H |
Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs and IDIVs. Note not the number of cycles but, the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a seperate FADD instruction. |
Counter 0 only |
|
|
11H |
PP_FP_ASSIST |
00H |
Number of floating-point exception cases handled by microcode. |
Counter 1 only. This event includes counts due to speculative execution. |
|
|
12H |
PP_MUL |
00H |
Number of multiplies. Note: includes integer and FP multiplies. |
Counter 1 only. This event includes counts due to speculative execution. |
|
|
13H |
PP_DIV |
00H |
Number of divides. Note: includes integer and FP multiplies. |
Counter 1 only. This event includes counts due to speculative execution. |
|
|
14H |
PP_CYCLES_DIV_BUSY |
00H |
Number of cycles that the divider is busy, and cannot accept new divides. Note: includes integer and FP divides, FPREM, FPSQRT, etc. |
Counter 0 only. This event includes counts due to speculative execution. |
|
Memory Ordering |
03H |
PP_LD_BLOCKS |
00H |
Number of store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known to conflict, but whose data is unknown and preceding stores that conflicts with the load, but which incompletely overlap the load. |
|
|
|
04H |
PP_SB_DRAINS |
00H |
Number of store buffer drain cycles. Incremented during every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, Interrupt acknowledgment as well as other conditions such as cache flushing. |
|
|
|
05H |
PP_MISALIGN_MEM_REF |
00H |
Number of misaligned data memory references. Incremented by 1 every cycle during which either the PPro load or store pipeline dispatches a misaligned uop. Counting is performed if its the first half or second half, or if it is blocked, squashed or misses. |
It should be noted that MISALIGN_MEM_REF is only an approximation, to the true number of misaligned memory references. The value returned is roughly proportional to the number of misaligned memory accesses, i.e. the size of the problem |
|
Instruction Decoding and Retirement |
C0H |
PP_INST_RETIRED |
OOH |
Total number of instructions retired. |
|
|
|
C2H |
PP_UOPS_RETIRED |
00H |
Total umber of UOPs retired. |
|
|
|
D0H |
PP_INST_DECODER |
00H |
Total number of instructions decoded. |
|
|
Interrupts |
C8H |
PP_HW_INT_RX |
00H |
Total number of hardware interrupts received. |
|
|
|
C6H |
PP_CYCLES_INT_MASKED |
00H |
Total number of processor cycles for which interrupts are disabled. |
|
|
|
C7H |
PP_CYCLES_INT_PENDING_AND_MASKD |
00H |
Total number of processor cycles for which interrupts are disabled and interrupts are pending. |
|
|
Branches |
C4H |
PP_BR_INST_RETIRED |
00H |
Total number of branch instructions retired. |
|
|
|
C5H |
PP_BR_MISS_PRED_RETIRED |
00H |
Total number of branch mispredictions that get to the point of retirement. Includes not taken conditional branches. |
|
|
|
C9H |
PP_BR_TAKEN_RETIRED |
00H |
Total number of taken branches retired. |
|
|
|
CAH |
PP_BR_MISS_PRED_TAKEN_RET |
00H |
Total number of taken but mispredicted branches that get to the point of retirement. Includes conditional branches only when taken. |
|
|
|
E0H |
PP_BR_INST_DECODED |
00H |
Total number of branch instructions decoded. |
|
|
|
E2H |
PP_BTB_MISSES |
00H |
Total number of branches that for which the BTB did not produce a prediction |
|
|
|
E4H |
PP_BR_BOGUS |
00H |
Total number of branch predictions that are generated but are not actually branches. |
|
|
|
E6H |
PP_BACLEARS |
00H |
Total number of time BACLEAR is asserted. This is the number of times that a static branch prediction was made, where the branch decoder decided to make a branch prediction because the BTB did not. |
|
|
Stalls |
A2H |
PP_RESOURCE_STALLS |
00H |
Incremented by one during every cycle that there is a resource related stall. Includes register renaming buffer entries (ROB entries), memory buffer entries(LB and SB entries). Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. |
|
|
|
D2H |
PP_PARTIAL_RAT_STALLS |
00H |
Number of cycles or events for partial stalls. Note Includes flag partial stalls. |
|
|
Segment Register Loads |
06H |
PP_SEGMENT_REG_LOADS |
00H |
Number of segment register loads |
|
|
Clocks |
79H |
PP_CPU_CLK_UNHALTED |
00H |
Number of cycles during which the processor is not halted. |
|
Notes
1. Several L2 cache events, where noted, can be further qualified using the Unit Mask (UMSK) field in the PerfEvtSel0 and PerfEvtSel1 registers. The lower 4 bits of the Unit Mask field are used in conjunction with L2 events to indicate the cache state or cache states involved. The Pentium Pro processor identifies cache states using the "MESI" protocol and consequently each bit in the Unit Mask field represents one of the four states: UMSK[3] = M (8H) state, UMSK[2] = E (4H) state, UMSK[1] = S (2H) state, and UMSK[0] = I (1H) state. UMSK[3:0] = MES" (FH) should be used to collect data for all states; UMSK = 0H, for the applicable events, will result in nothing being counted.
2. All of the external bus logic (EBL) events, except where noted, can be further qualified using the Unit Mask (UMSK) field in the PerfEvtSel0 and PerfEvtSel1 registers. Bit 5 of the UMSK field is used in conjunction with the EBL events to indicate whether the processor should count transactions that are self generated (UMSK[5] = 0) or transactions that result from any processor on the bus (UMSK[5] = 1).
Acknowledgement and Disclaimer