How To Profile Benchmark Hot Paths
Use this guide when benchmark ratios point at a real hotspot and you need call-stack evidence before changing the runtime.
What this workflow does
The repo ships a focused benchmark-runner profile mode plus a perf wrapper script:
|
That command:
- runs one deterministic benchmark operation repeatedly in
Release - captures
perf statcounters intoperf.stat.txt - captures sampled stacks into
perf.data - emits both
.NETperf maps and JIT dump metadata so managed frames can be symbolized - renders a text call-stack report into
perf.report.txt
Artifacts land under .artifacts/profiling/<operation>-<scenario-or-records>-<iterations>iters/.
Supported operations
codecmapper-serializecodecmapper-deserialize-bytesstj-serializestj-deserializenewtonsoft-serializenewtonsoft-deserialize
Typical workflow
Profile the CodecMapper path first:
|
Then capture the System.Text.Json baseline on the same scenario:
|
Read perf.stat.txt for high-level counters and perf.report.txt for the hottest call paths.
Notes
- The profile wrapper now defaults to the
person-batch-25scenario from the shared benchmark matrix. - Pass a scenario name such as
telemetry-500orescaped-articles-20to profile one of the standard workloads. - Passing a plain integer as the third argument still uses the legacy nested-record batch with
--records <n>. - The wrapper sets
DOTNET_PerfMapEnabled=3andCOMPlus_PerfMapEnabled=3soperf inject --jithas the metadata it needs for managed symbol names. - If
perf recordis blocked by local kernel permissions, the script will fail before writingperf.report.txt. In that case, fix localperfpermissions first and rerun the same command. - Keep comparisons on the same power profile and CPU governor, otherwise the counter deltas are noisy enough to mislead.
CodecMapper