Monitors

The process of setting up and reading from various performance montoring counters is delegated to various Monitor types. These monitors are:

These counters work on a CPU level granularity and can capture information such as number of retiretired instructions, number of floating point instrucitons, L1/L2 accesses etc.

This can record events such as number of DRAM reads and writes.

This can record events such as number of L3 hits and misses.

Monitor API

After creations, all monitors have the same simple API. The most common method will be read, which will read from all of the PMUs currently controlled by the monitor and return the raw counters values in a CounterTools.Record. See the CounterTools.Record documentation for details on working with that data structure.

Two additional methods specified by each Monitor are CounterTools.program! and CounterTools.reset!. These methods configure the PMUs and reset the PMUs to their default state respectively. Normally, you will not have to call these methods directly since programming is usually done during monitor creation and CounterTools.reset! is automatically called when the Monitor is garbage collected.

A simple usage of this would look like:

monitor = # create monitor

# Read once from the counters
first = read(monitor)

# Read again from the coutners
second = read(monitor)

# Automatically compute the counter deltas
deltas = second - first

# Aggregate all deltas
CounterTools.aggregate(deltas)

Additionally, if you are working with multiple samples, the following can serve as a template for your code.

monitor = # create monitor
data = map(1:10) do i
    sleep(0.1)
    read(monitor)
end

# `data` is a `Vector{<:Record}`
# To compute the counter difference across all samples, we can call Julia's `diff` function:
deltas = diff(data)

# Finally, we can aggregate each diff.
CounterTools.aggregate.(deltas)
Note

Raw counter values will be wrapped in a CounterTools.CounterValue type that will automatically detect and correct for counter overflow when subtracting counter values.

Monitor Documentation

Monitors

CounterTools.CoreMonitorType
CoreMonitor(events, cpus; program = true)

Construct a CoreMonitor monitoring events on cpus. Arguments events should be a Tuple of CounterTools.CoreSelectRegister and cpus is any iterable collection of CPU indices.

If program == true, then also program the performance counters to on each CPU.

source
CounterTools.IMCMonitorType
IMCMonitor(events, socket; [program = true, finalize = true])

Monitor the Integrated Memory Controller (IMC) for events on a single selected CPU socket. This can gather information such as number of DRAM read and write operations. Argument event should be a Tuple of CounterTools.UncoreSelectRegister and socket should be either an Integer or IndexZero.

If finalize = true is passed, a finalizer will be attached to the IMCMonitor to clean up the hardware counter's state.

source
CounterTools.CHAMonitorType
CHAMonitor(events, socket, cpu; [program = true], [filter0], [filter1])

Monitor the Caching Home Agent counters for events on a single selected CPU socket. This can gather information such as number of L3 hits and misses. Argument event should be a Tuple of CounterTools.UncoreSelectRegister and socket should be either an Integer or IndexZero. Further, cpu is the CPU that will be actually reading the counters. For best performance, cpu should be located on socket.

Filters

The CHA Performance Monitoring Units allow counters to be filtered in various ways such as issuing Core or Thread ID, request opcode etc.

These can be passed via the filter0 and filter1 keyword arguments and correspond to the CHA filters 0 and 1 repectively.

Note: filter0 should be a CounterTools.CHAFilter0 and filter1 should be a CounterTools.CHAFilter1.

source

API

Base.readMethod
read(monitor) -> Record

Read from all counters currently managed by monitor and return the results as a CounterTools.Record. The structure of the CounterTools.Record usually reflects the hierarchical structure of the counters being monitored.

source
CounterTools.program!Function
program!(monitor)

Program the PMUs managed by monitor. This must be called before any results returned from read will be meaningful.

This method is called automatically when monitor was created unless the program = false keyword was passed the monitor contructor function.

source
CounterTools.reset!Function
reset!(monitor)

Set the PMUs managed by monitor back to their original state.

This method is called automatically when monitor is garbage collected unless the reset = false keyword is passed to the monitor constructor function.

source

Select Registers

CounterTools.CoreSelectRegisterType
CoreSelectRegister(; kw...)

Construct a bitmask for programming Core level counters.

Keywords

  • event::UInt: Select the event to be counted. Default: 0x00

  • umask::Uint: Select the subevent to be counted within the selected event. Default: 0x00

  • usr::Bool: Specifies the counter should be active when the processor is operating at privilege modes 1, 2, and 3. Default: true.

  • os::Bool: Specifies the counter should be active when the processor is operating at privilege mode 0. Default: true.

  • e::Bool: Edge detect. Default: false.

  • en::Bool: Enable the counter. Default: true.

  • inv::Bool: When set, inverts the counter-mask (CMASK) comparison, so that both greater than or equal to and less than comparisons can be made (0: greater than or equal; 1: less than). Note if counter-mask is programmed to zero, INV flag is ignored. Default: false.

  • cmask::Bool: When this field is not zero, a logical processor compares this mask to the events count of the detected microarchitectural condition during a single cycle. If the event count is greater than or equal to this mask, the counter is incremented by one. Otherwise the counter is not incremented.

    This mask is intended for software to characterize microarchitectural conditions that can count multiple occurrences per cycle (for example, two or more instructions retired per clock; or bus queue occupations). If the counter-mask field is 0, then the counter is incremented each cycle by the event count associated with multiple occurrences.

source
CounterTools.UncoreSelectRegisterType
UncoreSelectRegister(; kw...)

Construct a bitmask for programming Uncore level counters.

Keywords

  • event::UInt: Select event to be counted. Default: 0x00

  • umask::UInt: Select subevents to be counted within the selected event. Default: 0x00

  • reset::Bool: When set to 1, the corresponding counter will be cleared to 0. Default: false

  • edge_detact::Bool: When set to 1, rather than measuring the event in each cycle it is active, the corresponding counter will increment when a 0 to 1 transition (i.e. rising edge) is detected.

    When 0, the counter will increment in each cycle that the event is asserted.

    NOTE: edge_detect is in series following thresh, Due to this, the thresh field must be set to a non-0 value. For events that increment by no more than 1 per cycle, set thresh to 0x1. Default: false.

  • overflow_enable::Bool: When this bit is set to 1 and the corresponding counter overflows, an overflow message is sent to the UBox’s global logic. The message identifies the unit that sent it. Default: false.

  • en::Bool: Local Counter Enable. Default: true.

  • invert::Bool: Invert comparison against Threshold.

    0 - comparison will be ‘is event increment >= threshold?’.

    1 - comparison is inverted - ‘is event increment < threshold?’

    e.g. for a 64-entry queue, if SW wanted to know how many cycles the queue had fewer than 4 entries, SW should set the threshold to 4 and set the invert bit to 1. Default: false.

  • thresh::UInt: Threshold is used, along with the invert bit, to compare against the counter’s incoming increment value. i.e. the value that will be added to the counter.

    For events that increment by more than 1 per cycle, if the threshold is set to a value greater than 1, the data register will accumulate instances in which the event increment is >= threshold. Default: 0x00.

source