Draft
Conversation
Pure copy of callgrind/ to tracegrind/ with symbol prefix rename CLG_ → TG_ (expanding to vgTracegrind_), header guards updated, public header renamed to tracegrind.h with TRACEGRIND_* macros. No behavioral changes — output is still identical to callgrind. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace callgrind's accumulated callgraph output with streaming CSV trace data emitted at function ENTER/EXIT boundaries. Each row contains delta counters since the last sample, enabling per-call cost attribution. Key changes: - dump.c: Replace callgraph output with CSV trace (trace_open/emit/close) - callstack.c: Hook push/pop_call_stack to emit ENTER/EXIT samples - threads.c: Add per-thread last_sample_cost for delta tracking - global.h: Add trace_output struct and per-thread sample state - main.c: Open trace at init, close at fini, update copyright Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add --output-format=csv|msgpack option. MsgPack format uses LZ4 block compression achieving ~12x compression vs CSV. New files: - tg_msgpack.c/h: MsgPack encoder (write-only) - tg_lz4.c/h: LZ4 compression wrapper with VG_() adaptations - lz4.c/h: Vendored LZ4 library (BSD-2-Clause) - docs/tracegrind-msgpack-format.md: Format specification Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update msgpack format to version 2 with event_schemas - Each event type (ENTER, EXIT, FORK) has its own column schema - FORK events use minimal 4-element format: [seq, tid, event, child_pid] - Remove CSV output format entirely (msgpack-only now) - Add decode-trace.py script for debugging trace files - Add fork detection via post-syscall handler for fork/clone/vfork Co-Authored-By: Claude Opus 4.5 <[email protected]>
4ccc4f9 to
864a751
Compare
Add tracegrind configurations to the benchmark suite: - tracegrind/default: basic tracing - tracegrind/cache-sim: with cache simulation - tracegrind/cache-sim+systime: with cache sim and syscall timing This allows direct performance comparison between callgrind and tracegrind. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Detect available tools at startup and only run benchmarks for tools that are present. This fixes CI failures when running against upstream valgrind which doesn't have tracegrind. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add TRACEGRIND_ADD_MARKER client request that emits named marker events (event=0) into the trace stream, renumbering ENTER=1, EXIT=2, FORK=3. Remove the legacy dump_profile/zero_all_cost/dump_every_bb machinery inherited from callgrind, replacing it with the simpler compute_total_cost. Update the analyzer script (renamed from decode-trace.py) to match the new event numbering. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Remove ~240 lines of unused code inherited from callgrind: - Dead CLI options (combine-dumps, compress-*, dump-*, collect-alloc, etc.) - Dead struct fields (jCC.creation_seq, BBCC.ret_counter, fn_node.is_malloc/is_realloc/is_free, etc.) - Dead functions (forall_bbccs, zero_bbcc, cachesim_dump_desc, cachesim_add_icost) - Dead types and typedefs (OutputFormat, fCC, SimCost, UserCost, AddrPos, AddrCost, FnPos) - Dead EG_ALLOC event group and its registration Co-Authored-By: Claude Opus 4.6 <[email protected]>
These callgrind-inherited options are unnecessary for tracegrind's streaming trace model. Simplifies recursion depth tracking to always increment/decrement unconditionally. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add vg_regtest-based regression tests covering basic tracing, markers, instrumentation toggle, toggle collect, call chains, inlining behavior, and schema validation. Extend CI matrix to run tracegrind tests alongside callgrind on both Ubuntu 22.04 and 24.04. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…nction tracking Track inlined function transitions at the BB level using Valgrind's debug info API. This bumps the trace format to v3 with two new event types (4=ENTER_INLINED, 5=EXIT_INLINED), updates the analyzer script to handle them, and adds regression tests for enter and nested inlined scenarios. Co-Authored-By: Claude Opus 4.6 <[email protected]>
… diffing Replace flat single-pointer inline tracking with a per-BB inline call stack built via Valgrind's InlIPCursor API. BB-to-BB transitions now diff the old and new inline stacks to emit the minimal EXIT/ENTER sequence, producing correct containment (ENTER outer → ENTER inner → EXIT inner → EXIT outer) instead of flat transitions. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add tests for signal handling, C++ exceptions, longjmp, tail calls, and deep recursion (100 levels) to verify call stack correctness across non-trivial control flow. Also fix missing -I include path for tracegrind.h in test Makefile. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Emit THREAD_CREATE (type 6) when new threads are spawned, using VG_(track_pre_thread_ll_create). Suppress spurious FORK events for pthread_create by checking CLONE_THREAD flag in clone/clone3 syscalls. Rename events for consistency: ENTER→ENTER_FN, EXIT→EXIT_FN, ENTER_INLINED→ENTER_INLINED_FN, EXIT_INLINED→EXIT_INLINED_FN. Reorder: ENTER_INLINED_FN=3, EXIT_INLINED_FN=4, FORK=5. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Verify that syscall instruction counts and timing (sysCount, sysTime, sysCpuTime) are properly attributed to libc wrapper functions (getpid, write) when --collect-systime=nsec is enabled. Nonzero timing values on EXIT_FN events are normalized to T to assert measurement occurred. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Include creator ("valgrind-tracegrind") and creator_version fields in
the schema chunk so consumers can identify which tool and version
produced the trace file.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Spawns 3 threads with distinct noinline call chains at different depths (work_a->depth_a1->depth_a2, work_b->depth_b1, work_c->depth_c1->depth_c2) to verify tracegrind correctly tracks per-thread ENTER_FN/EXIT_FN stacks. Output is sorted by tid for deterministic comparison. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ed file path The expected output had file=??? but debug info now correctly resolves the source file to test_thread_create.c. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Record the unit of time-based event counters (sysTime, sysCpuTime) in the schema chunk so consumers can interpret values without out-of-band knowledge of the --collect-systime setting. The map is extensible for future counters. Co-Authored-By: Claude Opus 4.6 <[email protected]>
… file The output file path is now used exactly as specified by --tracegrind-out-file. The default format includes the extension so the default behavior is unchanged. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…nters array Extract counter column names from inline event schemas into a separate top-level `counters` field and nest counter deltas as a sub-array within fn-call data rows. This makes the schema self-describing for counter layout without repeating counter names in every event type. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Adds a directory for pre-generated tracegrind output files that serve as reference material for trace parser implementations, along with a script to regenerate them. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Perf profiling revealed LZ4 compression (11.4% of runtime) and per-event strlen calls (4.6%) as the top two optimization targets. Switch from LZ4_compress_default to LZ4_compress_fast with acceleration=2 for faster compression at marginal ratio cost. Cache name_len in fn_node, file_node, and obj_node structs so msgpack_write_str receives pre-computed lengths instead of calling VG_(strlen) on every trace event. This eliminates strlen from the perf profile entirely. Benchmarked improvement: 55-78ms saved (10-13% of the TG-CG gap) on ls -lR /usr/share/doc workload. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add .pre-commit-config.yaml with clang-format scoped to tracegrind/ only. Reformat all tracegrind source files to match the repo's .clang-format style. Co-Authored-By: Claude Opus 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.