abseil · axgillies · Dec 9, 2025 · Dec 8, 2025
diff --git a/_posts/2023-03-02-fast-21.md b/_posts/2023-03-02-fast-21.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #21 on January 16, 2020
 
 *By [Paul Wankadia](mailto:[email protected]) and [Darryl Gove](mailto:[email protected])*
 
-Updated 2024-10-21
+Updated 2025-09-03
 
 Quicklink: [abseil.io/fast/21](https://abseil.io/fast/21)
 

diff --git a/_posts/2023-03-02-fast-39.md b/_posts/2023-03-02-fast-39.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #39 on January 22, 2021
 
 *By [Chris Kennelly](mailto:[email protected]) and [Alkis Evlogimenos](mailto:[email protected])*
 
-Updated 2025-03-24
+Updated 2025-09-29
 
 Quicklink: [abseil.io/fast/39](https://abseil.io/fast/39)
 
@@ -112,10 +112,11 @@ challenging: Microbenchmarks tend to have small working sets that tend to be
 cache resident. Real code, particularly Google C++, is not.
 
 In production, the cacheline holding `kMasks` might be evicted, leading to much
-worse stalls (hundreds of cycles to access main memory). Additionally, on x86
-processors since Haswell, this [optimization can be past its prime](/fast/9):
-BMI2's `bzhi` instruction is both faster than loading and masking *and* delivers
-more consistent performance.
+worse stalls
+([hundreds of cycles to access main memory](https://sre.google/static/pdf/rule-of-thumb-latency-numbers-letter.pdf)).
+Additionally, on x86 processors since Haswell, this
+[optimization can be past its prime](/fast/9): BMI2's `bzhi` instruction is both
+faster than loading and masking *and* delivers more consistent performance.
 
 When developing benchmarks for
 [SwissMap](https://abseil.io/blog/20180927-swisstables), individual operations

diff --git a/_posts/2023-03-02-fast-53.md b/_posts/2023-03-02-fast-53.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #53 on October 14, 2021
 
 *By [Mircea Trofin](mailto:[email protected])*
 
-Updated 2024-11-19
+Updated 2025-09-03
 
 Quicklink: [abseil.io/fast/53](https://abseil.io/fast/53)
 
@@ -73,7 +73,7 @@ the process of writing a benchmark. An example of its use may be seen
 [here](https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)
 
 The benchmark harness support for performance counters consists of allowing the
-user to specify up to 3 counters in a comma-separated list, via the
+user to specify counters in a comma-separated list, via the
 `--benchmark_perf_counters` flag, to be measured alongside the time measurement.
 Just like time measurement, each counter value is captured right before the
 benchmarked code is run, and right after. The difference is reported to the user
@@ -131,13 +131,15 @@ instructions, and 6 memory ops per iteration.
 
 -   *Number of counters*: At most 32 events may be requested for simultaneous
     collection. Note however, that the number of hardware counters available is
-    much lower (usually 4-8 on modern CPUs) -- requesting more events than the
+    much lower (usually 4-8 on modern CPUs, see
+    `PerfCounterValues::kMaxCounters`) -- requesting more events than the
     hardware counters will cause
     [multiplexing](https://perf.wiki.kernel.org/index.php/Tutorial#multiplexing_and_scaling_events)
     and decreased accuracy.
 
--   *Visualization*: There is no visualization available, so the user needs to
-    rely on collecting JSON result files and summarizing the results.
+-   *Visualization*: There is no dedicated visualization UI available, so for
+    complex analysis, users may need to collect JSON result files and summarize
+    the results.
 
 -   *Counting vs. Sampling*: The framework only collects counters in "counting"
     mode -- it answers how many cycles/cache misses/etc. happened, but not does

diff --git a/_posts/2023-03-02-fast-9.md b/_posts/2023-03-02-fast-9.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #9 on June 24, 2019
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-27
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/9](https://abseil.io/fast/9)
 

diff --git a/_posts/2023-09-14-fast-7.md b/_posts/2023-09-14-fast-7.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #7 on June 6, 2019
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/7](https://abseil.io/fast/7)
 

diff --git a/_posts/2023-09-30-fast-52.md b/_posts/2023-09-30-fast-52.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #52 on September 30, 2021
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-24
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/52](https://abseil.io/fast/52)
 

diff --git a/_posts/2023-10-10-fast-64.md b/_posts/2023-10-10-fast-64.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #64 on October 21, 2022
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-24
+Updated 2025-09-29
 
 Quicklink: [abseil.io/fast/64](https://abseil.io/fast/64)
 
@@ -192,7 +192,7 @@ that can be returned. This approach has two problems:
     variable small string object buffer sizes. Returning `const std::string&`
     constrains the implementation to that particular size of buffer.
 
-In contrast, by returning `std::string_view` (or our
+In contrast, by returning [`std::string_view`](/tips/1) (or our
 [internal predecessor](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html),
 `StringPiece`), we decouple callers from the internal representation. The API is
 the same, independent of whether the string is constant data (backed by the

diff --git a/_posts/2023-10-15-fast-60.md b/_posts/2023-10-15-fast-60.md
@@ -12,14 +12,15 @@ Originally posted as Fast TotW #60 on June 6, 2022
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-24
+Updated 2025-09-29
 
 Quicklink: [abseil.io/fast/60](https://abseil.io/fast/60)
 
 
 [Google-Wide Profiling](https://research.google/pubs/pub36575/) collects data
 not just from our hardware performance counters, but also from in-process
-profilers.
+profilers. These have been covered in previous episodes covering
+[hashtables](/fast/26).
 
 In-process profilers can give deeper insights about the state of the program
 that are hard to observe from the outside, such as lock contention, where memory
@@ -39,8 +40,8 @@ decisions faster, shortening our
 The value is in pulling in the area-under-curve and landing in a better spot. An
 "imperfect" profiler that can help make a decision is better than a "perfect"
 profiler that is unwieldy to collect for performance or privacy reasons. Extra
-information or precision is only useful insofar as it helps us make a *better*
-decision or *changes* the outcome.
+information or precision is only useful insofar as it helps us make a
+[*better* decision or *changes* the outcome](/fast/94).
 
 For example, most new optimizations to
 [TCMalloc](https://github.com/google/tcmalloc/blob/master/tcmalloc) start from
@@ -54,7 +55,7 @@ steps didn't directly save any CPU usage or bytes of RAM, but they enabled
 better decisions. Capabilities are harder to directly quantify, but they are the
 motor of progress.
 
-## Leveraging existing profilers: the "No build" option
+## Leveraging existing profilers: the "No build" option {#no-build}
 
 Developing a new profiler takes considerable time, both in terms of
 implementation and wallclock time to ready the fleet for collection at scale.
@@ -65,19 +66,19 @@ For example, if the case for hashtable profiling was just reporting the capacity
 of hashtables, then we could also derive that information from heap profiles,
 TCMalloc's heap profiles of the fleet. Even where heap profiles might not be
 able to provide precise insights--the actual "size" of the hashtable, rather
-than its capacity--we can make an informed guess from the profile combined with
-knowledge about the typical load factors due to SwissMap's design.
+than its capacity--we can make an [informed guess](/fast/90) from the profile
+combined with knowledge about the typical load factors due to SwissMap's design.
 
 It is important to articulate the value of the new profiler over what is already
 provided. A key driver for hashtable-specific profiling is that the CPU profiles
 of a hashtable with a
 [bad hash function look similar to those](https://youtu.be/JZE3_0qvrMg?t=1864)
-with a good hash function. The added information collected for stuck bits helps
-us drive optimization decisions we wouldn't have been able to make. The capacity
-information collected during hashtable-profiling is incidental to the profiler's
-richer, hashtable-specific details, but wouldn't be a particularly compelling
-reason to collect it on its own given the redundant information available from
-ordinary heap profiles.
+with a good hash function. The [added information collected](/fast/26) for stuck
+bits helps us drive optimization decisions we wouldn't have been able to make.
+The capacity information collected during hashtable-profiling is incidental to
+the profiler's richer, hashtable-specific details, but wouldn't be a
+particularly compelling reason to collect it on its own given the redundant
+information available from ordinary heap profiles.
 
 ## Sampling strategies
 

diff --git a/_posts/2023-10-20-fast-70.md b/_posts/2023-10-20-fast-70.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #70 on June 26, 2023
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/70](https://abseil.io/fast/70)
 
@@ -129,6 +129,13 @@ performance improvements. We still need to measure the impact on application and
 service-level performance, but the proxies help us hone in on an optimization
 that we want to deploy faster.
 
+When we are considering multiple options for a project, secondary metrics can
+give us confirmation after the fact that our expectations were correct. For
+example, suppose we chose option A over option B because both provided
+comparable performance but A would not impact reliability. We should measure
+both the performance and reliability outcomes to support our engineering
+decision. This lets us close the loop between expectations and reality.
+
 ## Aligning with success
 
 The metrics we pick need to align with success. If a metric tells us to do the

diff --git a/_posts/2023-11-10-fast-74.md b/_posts/2023-11-10-fast-74.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #74 on September 29, 2023
 
 *By [Chris Kennelly](mailto:[email protected]) and [Matt Kulukundis](mailto:[email protected])*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/74](https://abseil.io/fast/74)
 
@@ -74,12 +74,12 @@ understand, we might be tempted to remove it. TCMalloc's fast path would appear
 cheaper, but other code somewhere else would experience a cache miss and
 [application productivity](/fast/7) would decline.
 
-To make matters worse, the cost is partly a profiling artifact. The TLB miss
-blocks instruction retirement, but our processors are superscalar, out-of-order
-behemoths. The processor can continue to execute further instructions in the
-meantime, but this execution is not visible to a sampling profiler like
-Google-Wide Profiling. IPC in the application may be improved, but not in a way
-immediately associated with TCMalloc.
+To make matters worse, the cost is partly [a profiling artifact](/fast/94). The
+TLB miss blocks instruction retirement, but our processors are superscalar,
+out-of-order behemoths. The processor can continue to execute further
+instructions in the meantime, but this execution is not visible to a sampling
+profiler like Google-Wide Profiling. IPC in the application may be improved, but
+not in a way immediately associated with TCMalloc.
 
 ### Hidden context switch costs
 
@@ -104,11 +104,11 @@ increase apparent kernel scheduler latency.
 
 ### Sweeping away protocol buffers
 
-Consider an extreme example. When our hashtable profiler for Abseil's hashtables
-indicates a problematic hashtable, a user could switch the offending table from
-`absl::flat_hash_map` to `std::unordered_map`. Since the profiler doesn't
-collect information about `std` containers, the offending table would no longer
-show up, although the fleet itself would be dramatically worse.
+Consider an extreme example. When [our hashtable profiler](/fast/26) for
+Abseil's hashtables indicates a problematic hashtable, a user could switch the
+offending table from `absl::flat_hash_map` to `std::unordered_map`. Since the
+profiler doesn't collect information about `std` containers, the offending table
+would no longer show up, although the fleet itself would be dramatically worse.
 
 While the above example may seem contrived, an almost entirely analogous
 recommendation comes up with some regularity: migrate users from protos to

diff --git a/_posts/2023-11-10-fast-75.md b/_posts/2023-11-10-fast-75.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #75 on September 29, 2023
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/75](https://abseil.io/fast/75)
 
@@ -161,9 +161,10 @@ benchmark does, and that can have some profound effects on what we measure. For
 example, there, since we're iterating over the same buffer, and there's no
 dependency on the last value, the processor is very likely to be able to
 speculatively start the next iteration and won't need to undo the work. This
-converts a benchmark that we'd like to measure as a chain of dependencies into a
-measurement of the number of pipelines that the processor has (or the duration
-of the dependency chain divided by the number of parallel executions).
+converts a benchmark that we'd like to measure as a
+[chain of dependencies](/fast/99) into a measurement of the number of pipelines
+that the processor has (or the duration of the dependency chain divided by the
+number of parallel executions).
 
 To make the benchmark more realistic, we can instead parse from a larger buffer
 of varints serialized end-on-end:

diff --git a/_posts/2024-09-04-fast-62.md b/_posts/2024-09-04-fast-62.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #62 on July 7, 2022
 
 *By [Chris Kennelly](mailto:[email protected]), [Luis Otero](mailto:[email protected]) and [Carlos Villavieja](mailto:[email protected])*
 
-Updated 2025-03-12
+Updated 2025-09-15
 
 Quicklink: [abseil.io/fast/62](https://abseil.io/fast/62)
 

diff --git a/_posts/2024-09-04-fast-72.md b/_posts/2024-09-04-fast-72.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #72 on August 7, 2023
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-02-18
+Updated 2025-08-23
 
 Quicklink: [abseil.io/fast/72](https://abseil.io/fast/72)
 
@@ -37,9 +37,9 @@ estimates to be correct, the primary goal is to have just enough information to
 optimization "A" over optimization "B" because "A" has a larger expected ROI.
 Oftentimes, we only need a
 [single significant figure](https://en.wikipedia.org/wiki/Significant_figures)
-to do so: Spending more time making a better estimate does not make things more
-efficient by itself. When new information arrives, we can update our priors
-accordingly.
+to do so: Spending more time making a better estimate or
+[gathering more data](/fast/94) does not make things more efficient by itself.
+When new information arrives, we can update our priors accordingly.
 
 Once we have identified an area to work in, we can shift to thinking about ways
 of tackling problems in that area.
@@ -226,9 +226,11 @@ helps in several ways.
 
     Success in one area brings opportunities for cross-pollination. We can take
     the same solution, an
-    [algorithm](https://research.google/pubs/pub50370.pdf#page=7) or data
-    structure, and apply the idea to a related but different area. Without the
-    original landing, though, we might have never realized this.
+    [algorithm](https://research.google/pubs/pub50370.pdf#page=7) (pages on huge
+    pages) or data structure, and apply the idea to a
+    [related but different area](https://storage.googleapis.com/gweb-research2023-media/pubtools/7777.pdf#page=9)
+    (objects on pages, or "span" prioritization). Without the original landing,
+    though, we might have never realized this.
 
 *   Circumstances are continually changing. The assumptions that started a
     project years ago may be invalid by the time the project is ready.

diff --git a/_posts/2024-09-04-fast-79.md b/_posts/2024-09-04-fast-79.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #79 on January 19, 2024
 
 *By [Chris Kennelly](mailto:[email protected]) and [Matt Kulukundis](mailto:[email protected])*
 
-Updated 2024-12-10
+Updated 2025-06-20
 
 Quicklink: [abseil.io/fast/79](https://abseil.io/fast/79)
 
@@ -77,8 +77,9 @@ smoothly.
 
 TIP: Prefer switching defaults to migrating code if you can.
 
-When we introduced hashtable profiling for monitoring tables fleet wide, some
-users were surprised that tables could be sampled (triggering additional system
+When we introduced
+[hashtable profiling for monitoring tables fleet wide](/fast/26), some users
+were surprised that tables could be sampled (triggering additional system
 calls). If we had tried to have sampled monitoring from the start, the migration
 would have had a new class of issues to debug. This also allowed us to have a
 [very clear opt-out for this specific feature](/fast/52) without delaying the
@@ -91,7 +92,8 @@ class of issues at a time.
 
 ## Iterative improvement: Deploying TCMalloc's CPU caches
 
-When TCMalloc was first introduced, it used per-thread caches, hence its name,
+When [TCMalloc](github.com/google/tcmalloc/blob/master/docs/index.html) was
+first introduced, it used per-thread caches, hence its name,
 "[Thread-Caching Malloc](https://goog-perftools.sourceforge.net/doc/tcmalloc.html)."
 As thread counts continued to increase, per-thread caches suffered from two
 growing problems: a per-process cache size was divided over more and more
@@ -127,8 +129,8 @@ development of
 [several optimizations](https://research.google/pubs/characterizing-a-memory-allocator-at-warehouse-scale/).
 TCMalloc includes extensive telemetry that enabled us to calculate the amount of
 memory being used for per-vCPU caches which provided estimates of the potential
-opportunity - to motivate the work - and the final impact - for recognising the
-benefit.
+opportunity - to motivate the work - and measure the final impact - for
+recognising the benefit.
 
 TIP: Tracking metrics that we intend to optimize later, even if not right away,
 can help identify when an idea is worth pursuing and prioritizing. By monitoring

diff --git a/_posts/2024-09-04-fast-83.md b/_posts/2024-09-04-fast-83.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #83 on June 17, 2024
 
 *By [Chris Kennelly](mailto:[email protected])*
 
-Updated 2025-02-18
+Updated 2025-08-23
 
 Quicklink: [abseil.io/fast/83](https://abseil.io/fast/83)