Skip to content

[fal.ai/livepeer-staging] TrickleSegmentWriteError mid-session — segment write fails to orch-staging-1 while session is active #912

@livepeer-tessa

Description

@livepeer-tessa

Summary

The livepeer_gateway.trickle_publisher is throwing TrickleSegmentWriteError during active sessions (not on teardown), failing to POST trickle segments to the orchestrator. This is distinct from #846 which covers 404 errors on session teardown. Here the stream is confirmed live (hundreds of prior segments succeeded), and then a segment POST fails mid-session.

cc @mjh1 @emranemran

Error Logs (Grafana Loki, 2026-04-10 ~20:18–20:21 UTC)

2026-04-10 20:18:47,802 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST exception url=https://orch-staging-1.daydream.monster:8935/ai/trickle/d21a61c3-4-out/583 error=Trickle POST exception ...

2026-04-10 20:20:52,724 - livepeer_gateway.trickle_publisher - WARNING - Trickle POST retrying same segment url=.../d21a61c3-4-out/584 (no request body consumed)

2026-04-10 20:21:52,932 - livepeer_gateway.trickle_publisher - ERROR - Trickle POST exception url=.../d21a61c3-4-out/584 error=...

Stack Trace

livepeer_gateway.trickle_publisher.TrickleSegmentWriteError: Trickle POST exception url=https://orch-staging-1.daydream.monster:8935/ai/trickle/d21a61c3-4-out/583
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/media_publish.py", line 856, in _stream_pipe_to_trickle
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/trickle_publisher.py", line 574, in write
  File "/app/.venv/lib/python3.12/site-packages/livepeer_gateway/trickle_publisher.py", line 224, in _run_post

Context

  • Session: d21a61c3 on orch-staging-1.daydream.monster:8935
  • Frequency: 2 segment failures (seq 583 and 584) in same session
  • Session stats at time of error: 583 segments started, 582 completed, 1 failed — stream was active ~27 minutes (elapsed_s=1490s)
  • App: github_f1lhgmk5v76a0ev1w0u378by-scope-livepeer-staging
  • Pattern: segment 583 POST failed → retry of 584 also failed with 'no request body consumed' → both end in TrickleSegmentWriteError

Probable Cause

The orchestrator dropped the connection or rejected the write mid-session (not EOF/404). The 'no request body consumed' warning on retry suggests the orch-side HTTP server may have closed the connection before reading the body — possibly a timeout, transient network issue, or orch-staging-1 hiccup.

Related Issues

Suggested Fix

  • Add explicit distinction between mid-session write failures vs teardown-404s in error handling
  • Consider retry logic with backoff for TrickleSegmentWriteError (currently appears to do one retry then fail the segment)
  • Add alerting/metric when segments_failed > 0 in MediaPublishStats

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions