Skip to content

docs(README): add Community Tools section linking cimbar-bigfile#165

Open
xPeiPeix wants to merge 3 commits into
sz3:masterfrom
xPeiPeix:community-tools-readme
Open

docs(README): add Community Tools section linking cimbar-bigfile#165
xPeiPeix wants to merge 3 commits into
sz3:masterfrom
xPeiPeix:community-tools-readme

Conversation

@xPeiPeix
Copy link
Copy Markdown

@xPeiPeix xPeiPeix commented May 9, 2026

Per discussion in #163.

Closes #163.

Adds a new "Community Tools" section to README.md with a single one-line entry linking to cimbar-bigfile — a pure-frontend wrapper that splits files into multiple parallel encode_id streams (10–15 MB chunks per stream, well below the single-stream wirehair capacity ceiling) and ships a manifest.json for SHA256-verified reassembly. No changes to libcimbar or cfc required.

Placement: between "Room for improvement/next steps" and "Inspiration", since the tool is a natural extension of the TODO note about splitting larger files.

Happy to adjust placement, wording, or formatting per your preference.

Per discussion in sz3#163. Adds a one-line entry under a new
"Community Tools" section, between "Room for improvement/next steps"
and "Inspiration".
@sz3
Copy link
Copy Markdown
Owner

sz3 commented May 11, 2026

What's the biggest file you've been able to send this way?

@xPeiPeix
Copy link
Copy Markdown
Author

xPeiPeix commented May 11, 2026

Hi @sz3, thanks for asking!

The largest file I've personally pushed end-to-end is ~100 MB. With the default params (Mode B / 15 fps / 10 MB chunks / 1.5x redundancy, scanning a 1080p screen with a Pixel 5), the run looks roughly like:

  • 10 data chunks + 1 manifest → 11 save dialogs on the CFC side
  • ~16–20 minutes wall-clock end-to-end
  • ~106 KB/s effective throughput

(Full numbers in README.en.md → Performance reference.)

The theoretical ceiling is bounded by libcimbar's wirehair fountain decoder — encode_id slots get recycled roughly every ~128 transfers (per #149), so a single session's chunk_count should stay under ~120, which works out to about ~1.2 GB at 10 MB/chunk as a practical upper bound. Past that, CFC starts dropping frames whose encode_id slot has been reused.


And one small UX observation from the multi-chunk side that you may already be aware of:

In CFC's progress UI, switching the on-screen stream doesn't seem to refresh the displayed progress bar. e.g. if I sit on part04 for a while, jump to part05, then come back to part04, the on-screen bar keeps advancing on the part05 progress (the underlying fountain_decoder_sink is bucketing both correctly — cimbard_get_report exposes per-stream state — it's just the rendering layer that doesn't switch which stream it tracks).

For multi-stream workflows it can make users unsure which stream is actually being received. We worked around it on the sender side (a "mark this chunk saved" button + a "lock current chunk" toggle so the on-screen stream stays put), which keeps the UX usable. Just sharing the observation in case it's useful context.

@Anonymous3-a
Copy link
Copy Markdown

I thought the max chunk size was 33MB, which would come out to 3.96GB.

@sz3
Copy link
Copy Markdown
Owner

sz3 commented May 12, 2026

The largest file I've personally pushed end-to-end is ~100 MB

Ok, that's the number I was looking for. I think the logic is sound, I'm just wary about claiming infinite transfer sizes. It's all fun and games until we find out there's some mysterious problem at 500 MB... also, I pity whoever tests that. 🙂

A bit of trivia that may be useful for the file size limit, and why lower sizes (e.g. 10MB) tend to be a better choice than the full 33 MB for an individual chunk. wirehair has (IIRC) a uint16_t block_id. The wirehair block size in mode B is 625 bytes (7500/12). (edit: I wrote 600 here. math is hard) Quick math puts that at about 39 41 MB of valid blocks, or in other words max 1.2x redundancy for a 33 MB file -- this is a hard cap. The counter will roll over when we hit it, so nothing should break (I hope -- I tested this at one point and it seemed fine), but basically we can ask for as much redundancy as we want but reality limits us.

Trivia: for this reason, at one point early on I had the file size limit at 16 MB, not 33. 🙃

Anyway, the combination of uint16_t and block size also means mode Bu can't actually transfer a 30 MB file. Its block size is smaller, so its cap is also lower. ...Which I should probably document. In general I think 10-15 MB is a good spot as a chunk size. (there's no penalty for redundant blocks, AFAIK. So feel free to use it all, e.g. 3x or 4x for 10 MB chunks)

One more bit of commentary I'm sure I've made before. Provided you're finished sending a chunk, you can re-use the encode_id if you slightly vary the chunk size... (e.g. 10.01 MB chunks after the first go around) So it takes a lot of baby sitting, but this approach probably can be extended to some obscenely large file sizes. (you can also restart the decoder to clear its cache of "done" files, which will have the same effect without changing the chunk size)

And one small UX observation from the multi-chunk side that you may already be aware of:

In CFC's progress UI, switching the on-screen stream doesn't seem to refresh the displayed progress bar. e.g. if I sit on part04 for a while, jump to part05, then come back to part04, the on-screen bar keeps advancing on the part05 progress (the underlying fountain_decoder_sink is bucketing both correctly — cimbard_get_report exposes per-stream state — it's just the rendering layer that doesn't switch which stream it tracks).

For multi-stream workflows it can make users unsure which stream is actually being received. We worked around it on the sender side (a "mark this chunk saved" button + a "lock current chunk" toggle so the on-screen stream stays put), which keeps the UX usable. Just sharing the observation in case it's useful context.

That seems like a bug. I'll take a look and see what I can find out.

xPeiPeix added a commit to xPeiPeix/cimbar-bigfile that referenced this pull request May 12, 2026
落实 sz3/libcimbar#165 评论中 libcimbar 作者提供的 trivia:
- manifest-spec.md 补 uint16_t × 600 bytes = 39 MB Mode B cap 推导 + chunk_size 选择对照表 + Mode Bu 警告
- README/README.en 主介绍精准化 (10MB sweet spot 而非"~10MB 上限")
- README/README.en 性能段加 "redundancy no penalty" 说明 + 三档场景对照
- README/README.en 加 "进阶: 超大文件" 章节 (encode_id 复用 trick: 变 chunk size 重启 / CFC 重启 decoder)
- send.html redundancy 默认 1.5x → 2.0x, max 3.0 → 5.0
- send.html PRESETS 全档上移 (balanced 1.5→2.0, aggressive 1.2→1.5, conservative 2.0→3.0)
- send.standalone.html 同步 build
@xPeiPeix
Copy link
Copy Markdown
Author

Hi @sz3, thanks for the detailed trivia — I've folded everything you mentioned into cimbar-bigfile main (commit 4253901):

  • docs/manifest-spec.md: added the wirehair uint16_t × 600-byte block-size derivation (~39 MB Mode B hard cap, 33 MB → 1.2x max redundancy as a hard ceiling) + a chunk_size selection table + a note that Mode Bu can't transfer ≥30 MB files
  • README (CN + EN): corrected the tagline to reference the ~39 MB single-stream cap rather than a vague "10 MB limit"; added a "Choosing a redundancy multiplier" section that quotes your "no penalty for 3x or 4x" line and explains the fountain-code intuition (surplus frames don't waste receive time)
  • send.html: bumped the default redundancy from 1.5x to 2.0x and raised the input max from 3.0x to 5.0x; shifted the three presets up accordingly (balanced 1.5→2.0, aggressive 1.2→1.5, conservative 2.0→3.0)
  • Added a new "Advanced: files larger than ~1.2 GB" section documenting the two encode_id re-use tricks you mentioned (vary chunk size between sessions / restart CFC to clear the "done" cache), explicitly marked as manual babysitting since neither is automated yet

Thanks again for the trivia — really helped sharpen both the docs and the defaults. Looking forward to whatever you find on the CFC progress-bar bug.

Comment thread README.md Outdated

## Community Tools

* [cimbar-bigfile](https://github.com/xPeiPeix/cimbar-bigfile) — A pure-frontend wrapper that splits files into multiple parallel `encode_id` streams (10–15 MB chunks per stream, well below the single-stream wirehair capacity ceiling) and ships a `manifest.json` for SHA256-verified reassembly. No changes to libcimbar or cfc required.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

parallel encode_id streams (10–15 MB chunks per stream), and ships a manifest.json for SHA256-verified reassembly. Supports 100+ MB files. No changes to libcimbar or cfc required.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied — thanks for the cleaner phrasing! Dropped the wirehair jargon and added the "100+ MB" capability anchor in c0a3566.

@sz3
Copy link
Copy Markdown
Owner

sz3 commented May 13, 2026

Looks good.

600-byte block-size derivation (~39 MB Mode B hard cap, 33 MB → 1.2x max redundancy as a hard ceiling)

Re-reading this, I did the (very basic) math wrong. 🙃

Anyway, the actual numbers are 625 bytes (technically 619 bytes because we have to subtract out the fountain header... but anyway) -> ~40.5 MB.

@xPeiPeix
Copy link
Copy Markdown
Author

Ha, thanks for the correction — applied in 9c67371. Manifest-spec and README tagline now use ~40.5 MB / 625 bytes per block (619 after subtracting the fountain header).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A community tool implementing the "split larger files" idea from DETAILS.md

3 participants