Skip to content

[perf] Reduce logger hot-path overhead and sync IO pressure#3210

Open
Xekep wants to merge 1 commit intoPryaxis:general-develfrom
TerraZ-Team:pr/micro-perf-log-hotpath
Open

[perf] Reduce logger hot-path overhead and sync IO pressure#3210
Xekep wants to merge 1 commit intoPryaxis:general-develfrom
TerraZ-Team:pr/micro-perf-log-hotpath

Conversation

@Xekep
Copy link
Copy Markdown
Contributor

@Xekep Xekep commented Feb 27, 2026

Summary

  • optimize caller resolution for log writes
  • reduce sync flush pressure by batching text log flushes
  • replace SQL failure-path First() with index access

Scope

  • focused perf-only change in TextLog and SqlLog`n- no unrelated file changes

Batch text log flushes, serialize writes under a lock, and avoid expensive caller stack introspection for info/verbose log events. Also remove LINQ First() allocation in SQL failure replay.
@hakusaro
Copy link
Copy Markdown
Member

@greptile review?

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR implements targeted performance optimizations for TShock's logging subsystem without changing functionality. Key improvements include:

  • Optimized caller resolution by skipping expensive stack traces for high-volume Info/Verbose logs
  • Reduced sync I/O pressure in TextLog by batching flushes (every 32 writes or 1 second) while preserving immediate flushing for Warning/Error messages
  • Added thread safety to TextLog with proper locking
  • Replaced LINQ .First() with direct index access in SQL failure recovery path
  • Correctly adjusted StackFrame depth from 2 to 3 to account for extracted ResolveCaller method

The changes are well-designed, maintain correctness, and deliver meaningful performance gains for logging hot paths.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • Focused performance-only changes with correct logic throughout, proper thread safety added, and no behavioral changes to existing functionality
  • No files require special attention

Important Files Changed

Filename Overview
TShockAPI/SqlLog.cs Optimized caller resolution with early-exit for high-volume logs and replaced LINQ with index access
TShockAPI/TextLog.cs Added thread safety and intelligent flush batching to reduce sync I/O pressure

Last reviewed commit: 15e4ce8

@hakusaro
Copy link
Copy Markdown
Member

@Xekep do you have any instrumentation that shows the performance gains on these PRs? E.g., before and after profiling, or some indication that these performance gains aren't purely theoretical or, at the very least, within the margin of error for CPU scheduling, etc?

@Xekep
Copy link
Copy Markdown
Contributor Author

Xekep commented Feb 27, 2026

@Xekep do you have any instrumentation that shows the performance gains on these PRs? E.g., before and after profiling, or some indication that these performance gains aren't purely theoretical or, at the very least, within the margin of error for CPU scheduling, etc?

That matches my real-world experience: before this patch series the server became progressively unresponsive under load (chat and general packet handling lagged noticeably), and after applying the changes responsiveness improved enough to make gameplay stable again.

@hakusaro
Copy link
Copy Markdown
Member

That matches my real-world experience: before this patch series the server became progressively unresponsive under load (chat and general packet handling lagged noticeably), and after applying the changes responsiveness improved enough to make gameplay stable again.

You changed like, 9 things at the same time it seems?

@sgkoishi
Copy link
Copy Markdown
Member

This PR probably won’t make a noticeable difference unless:

  • Your hard drive has a response time of 100ms or more (for the Flush change), or
  • You can notice differences at the microsecond level (for the StackFrame vs. StackTrace change).

@Xekep
Copy link
Copy Markdown
Contributor Author

Xekep commented Feb 27, 2026

This PR probably won’t make a noticeable difference unless:

  • Your hard drive has a response time of 100ms or more (for the Flush change), or
  • You can notice differences at the microsecond level (for the StackFrame vs. StackTrace change).

Thanks for the feedback - I agree this won’t matter much on low-log/idle servers.

The two changes are cumulative under sustained log volume:

  • Flush() is still a synchronous call per log write, even if the OS buffers disk I/O, so removing per-line flush pressure reduces contention and syscall frequency.
  • StackTrace vs StackFrame is small per call, but this path executes very frequently, so the CPU/allocation cost adds up under load.

So this is not about needing a 100ms disk or perceiving microseconds directly; it’s about reducing repeated overhead in a hot path at scale.

@hakusaro
Copy link
Copy Markdown
Member

@Xekep do you know what the juice number * 64 / 128 is? You can find it under valid channels

@Xekep
Copy link
Copy Markdown
Contributor Author

Xekep commented Feb 27, 2026

@Xekep do you know what the juice number * 64 / 128 is? You can find it under valid channels

😐TShock is pretty sluggish under load right now, so I’d like to focus on fixes.

@hakusaro
Copy link
Copy Markdown
Member

@Xekep (antiparticles here, stepping in for particles) 😅 You're absolutely right! Let's focus on the performance improvements, rather than worrying about the juice number. How many players are you playing with, and what are the resources allocated to your server?

@Xekep
Copy link
Copy Markdown
Contributor Author

Xekep commented Feb 27, 2026

@Xekep (antiparticles here, stepping in for particles) 😅 You're absolutely right! Let's focus on the performance improvements, rather than worrying about the juice number. How many players are you playing with, and what are the resources allocated to your server?

The slowdown starts even at low player counts, not only at peak load.

Server specs:

  • Windows Server
  • Ryzen 9 5950X
  • 16 GB RAM
  • NVMe storage

@sgkoishi
Copy link
Copy Markdown
Member

Thanks for the feedback - I agree this won’t matter much on low-log/idle servers.
it’s about reducing repeated overhead in a hot path at scale

This method isn’t on a hot path at all, and most of the time in real world it’s called less than once per second. Even at peak load it is still unlikely to be calling it 100 times per second. The StackFrame change saves about 2 microseconds per call, and the similar goes for the syscall, which is a sub-millisecond level difference that human will never notice.

@hakusaro
Copy link
Copy Markdown
Member

@Xekep

So this is not about needing a 100ms disk or perceiving microseconds directly; it’s about reducing repeated overhead in a hot path at scale.

At the scale of how many players? Realistically I'm worried that you've combined and changed so many things that there are superfluous changes that don't impact much conflating things.

The issue here is that your PRs all slightly change performance characteristics, while changing a lot of code in more complex ways. Highly optimized code is typically less understandable because it has to do more acrobatics to do the same thing.

TShock is already a highly complex codebase. A big barrier to entry for new contributors is understanding the control flow. We sincerely want to avoid changing the codebase to make it harder to maintain, especially if the performance overhead is negligible at the cost of code maintainability.

As you may have noticed, we have not had a steady set of releases. We have, now, had real contributors diving in to solve major problems. Changes like this are difficult to accept without hard numbers.

How do we know which PRs you've submitted actually make meaningful improvements and warrant the extra complexity? It is difficult to know without concrete numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants