Skip to content

Add gemma3-opus-distill post#3377

Open
vicArc wants to merge 1 commit intohuggingface:mainfrom
vicArc:Viesar/gemma3-opus-distill
Open

Add gemma3-opus-distill post#3377
vicArc wants to merge 1 commit intohuggingface:mainfrom
vicArc:Viesar/gemma3-opus-distill

Conversation

@vicArc
Copy link
Copy Markdown

@vicArc vicArc commented May 7, 2026

A community/guest post: a negative-results postmortem on QLoRA-distilling Claude Opus 4.6 reasoning traces into Gemma 3 4B on a single RTX 3050. The fine-tune trains cleanly (loss 2.67 → 1.01) but regresses against the base on MATH-500 (-5.0pp) and GSM8K (-15.0pp). The post explains the two contributing causes (eval format mismatch, capability narrowing on a narrow 1.9k-example dataset), what I'd do differently, and why publishing measured negative results is more useful than the usual celebratory fine-tune writeup.

Preparing the Article

  • Add an entry to _blog.yml
  • Add a thumbnail (assets/gemma3-opus-distill/thumbnail.jpg)
  • Short title and blog path (gemma3-opus-distill)
  • Frontmatter with author (Viesar, guest)
  • Publication date is correct (May 7, 2026)
  • Previewed in https://huggingface.co/new-blog (no publish)

Notes

  • All linked models/datasets are already public on the Hub.
  • No additional assets beyond the thumbnail; large images aren't needed for this post.
  • External/community contribution — happy to take review feedback on tone or length.

A negative-results writeup on QLoRA-distilling Claude Opus 4.6 reasoning
traces into Gemma 3 4B on a single RTX 3050 — fine-tune trains cleanly
but regresses on MATH-500 (-5pp) and GSM8K (-15pp), with postmortem
on evaluation format mismatch and capability narrowing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant