Monitors
The process of setting up and reading from various performance montoring counters is delegated to various Monitor
types. These monitors are:
CounterTools.CoreMonitor
: Collects Core level counters.
These counters work on a CPU level granularity and can capture information such as number of retiretired instructions, number of floating point instrucitons, L1/L2 accesses etc.
CounterTools.IMCMonitor
: Manages counters on the Integrated Memory Controller (iMC).
This can record events such as number of DRAM reads and writes.
CounterTools.CHAMonitor
: Manages counters for the Caching Home Agents (CHA) in the system.
This can record events such as number of L3 hits and misses.
Monitor API
After creations, all monitors have the same simple API. The most common method will be read
, which will read from all of the PMUs currently controlled by the monitor and return the raw counters values in a CounterTools.Record
. See the CounterTools.Record
documentation for details on working with that data structure.
Two additional methods specified by each Monitor are CounterTools.program!
and CounterTools.reset!
. These methods configure the PMUs and reset the PMUs to their default state respectively. Normally, you will not have to call these methods directly since programming is usually done during monitor creation and CounterTools.reset!
is automatically called when the Monitor is garbage collected.
A simple usage of this would look like:
monitor = # create monitor
# Read once from the counters
first = read(monitor)
# Read again from the coutners
second = read(monitor)
# Automatically compute the counter deltas
deltas = second - first
# Aggregate all deltas
CounterTools.aggregate(deltas)
Additionally, if you are working with multiple samples, the following can serve as a template for your code.
monitor = # create monitor
data = map(1:10) do i
sleep(0.1)
read(monitor)
end
# `data` is a `Vector{<:Record}`
# To compute the counter difference across all samples, we can call Julia's `diff` function:
deltas = diff(data)
# Finally, we can aggregate each diff.
CounterTools.aggregate.(deltas)
Raw counter values will be wrapped in a CounterTools.CounterValue
type that will automatically detect and correct for counter overflow when subtracting counter values.
Monitor Documentation
Monitors
CounterTools.CoreMonitor
— TypeCoreMonitor(events, cpus; program = true)
Construct a CoreMonitor
monitoring events
on cpus
. Arguments events
should be a Tuple
of CounterTools.CoreSelectRegister
and cpus
is any iterable collection of CPU indices.
If program == true
, then also program the performance counters to on each CPU.
CounterTools.IMCMonitor
— TypeIMCMonitor(events, socket; [program = true, finalize = true])
Monitor the Integrated Memory Controller (IMC) for events
on a single selected CPU socket
. This can gather information such as number of DRAM read and write operations. Argument event
should be a Tuple
of CounterTools.UncoreSelectRegister
and socket
should be either an Integer
or IndexZero
.
If finalize = true
is passed, a finalizer will be attached to the IMCMonitor
to clean up the hardware counter's state.
CounterTools.CHAMonitor
— TypeCHAMonitor(events, socket, cpu; [program = true], [filter0], [filter1])
Monitor the Caching Home Agent counters for events
on a single selected CPU socket
. This can gather information such as number of L3 hits and misses. Argument event
should be a Tuple
of CounterTools.UncoreSelectRegister
and socket
should be either an Integer
or IndexZero
. Further, cpu
is the CPU that will be actually reading the counters. For best performance, cpu
should be located on socket
.
Filters
The CHA Performance Monitoring Units allow counters to be filtered in various ways such as issuing Core or Thread ID, request opcode etc.
These can be passed via the filter0
and filter1
keyword arguments and correspond to the CHA filters 0 and 1 repectively.
Note: filter0
should be a CounterTools.CHAFilter0
and filter1
should be a CounterTools.CHAFilter1
.
API
Base.read
— Methodread(monitor) -> Record
Read from all counters currently managed by monitor
and return the results as a CounterTools.Record
. The structure of the CounterTools.Record
usually reflects the hierarchical structure of the counters being monitored.
CounterTools.program!
— Functionprogram!(monitor)
Program the PMUs managed by monitor
. This must be called before any results returned from read
will be meaningful.
This method is called automatically when monitor
was created unless the program = false
keyword was passed the monitor contructor function.
CounterTools.reset!
— Functionreset!(monitor)
Set the PMUs managed by monitor
back to their original state.
This method is called automatically when monitor
is garbage collected unless the reset = false
keyword is passed to the monitor constructor function.
Select Registers
CounterTools.CoreSelectRegister
— TypeCoreSelectRegister(; kw...)
Construct a bitmask for programming Core
level counters.
Keywords
event::UInt
: Select the event to be counted. Default:0x00
umask::Uint
: Select the subevent to be counted within the selected event. Default:0x00
usr::Bool
: Specifies the counter should be active when the processor is operating at privilege modes 1, 2, and 3. Default:true
.os::Bool
: Specifies the counter should be active when the processor is operating at privilege mode 0. Default:true
.e::Bool
: Edge detect. Default:false
.en::Bool
: Enable the counter. Default:true
.inv::Bool
: When set, inverts the counter-mask (CMASK) comparison, so that both greater than or equal to and less than comparisons can be made (0: greater than or equal; 1: less than). Note if counter-mask is programmed to zero, INV flag is ignored. Default:false
.cmask::Bool
: When this field is not zero, a logical processor compares this mask to the events count of the detected microarchitectural condition during a single cycle. If the event count is greater than or equal to this mask, the counter is incremented by one. Otherwise the counter is not incremented.This mask is intended for software to characterize microarchitectural conditions that can count multiple occurrences per cycle (for example, two or more instructions retired per clock; or bus queue occupations). If the counter-mask field is 0, then the counter is incremented each cycle by the event count associated with multiple occurrences.
CounterTools.UncoreSelectRegister
— TypeUncoreSelectRegister(; kw...)
Construct a bitmask for programming Uncore
level counters.
Keywords
event::UInt
: Select event to be counted. Default:0x00
umask::UInt
: Select subevents to be counted within the selected event. Default:0x00
reset::Bool
: When set to 1, the corresponding counter will be cleared to 0. Default:false
edge_detact::Bool
: When set to 1, rather than measuring the event in each cycle it is active, the corresponding counter will increment when a 0 to 1 transition (i.e. rising edge) is detected.When 0, the counter will increment in each cycle that the event is asserted.
NOTE:
edge_detect
is in series followingthresh
, Due to this, thethresh
field must be set to a non-0 value. For events that increment by no more than 1 per cycle, setthresh
to 0x1. Default:false
.overflow_enable::Bool
: When this bit is set to 1 and the corresponding counter overflows, an overflow message is sent to the UBox’s global logic. The message identifies the unit that sent it. Default:false
.en::Bool
: Local Counter Enable. Default:true
.invert::Bool
: Invert comparison against Threshold.0 - comparison will be ‘is event increment >= threshold?’.
1 - comparison is inverted - ‘is event increment < threshold?’
e.g. for a 64-entry queue, if SW wanted to know how many cycles the queue had fewer than 4 entries, SW should set the threshold to 4 and set the invert bit to 1. Default:
false
.thresh::UInt
: Threshold is used, along with the invert bit, to compare against the counter’s incoming increment value. i.e. the value that will be added to the counter.For events that increment by more than 1 per cycle, if the threshold is set to a value greater than 1, the data register will accumulate instances in which the event increment is >= threshold. Default:
0x00
.
CounterTools.CHAFilter0
— TypeCHAFilter0(; kw...)
CounterTools.CHAFilter1
— TypeCHAFilter1(; kw...)