Add benchmark results for x-ai/grok-code-fast-1 (concurrency 20) and in-progress checkpoints#337
Open
mentatbot[bot] wants to merge 6 commits intomainfrom
Open
Add benchmark results for x-ai/grok-code-fast-1 (concurrency 20) and in-progress checkpoints#337mentatbot[bot] wants to merge 6 commits intomainfrom
mentatbot[bot] wants to merge 6 commits intomainfrom
Conversation
…ncurrency 20 - Committing generated results under `locodiff-250425/results/*/x-ai_grok-code-fast-1/2025*` - This is an in-progress checkpoint while the benchmark continues to run - Will commit final results and summary once the run completes Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/d3923fac-c209-4613-87c8-ddbd1b18c581 Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
- Committing additional generated results under `locodiff-250425/results/*/x-ai_grok-code-fast-1/2025*` - Benchmark is still running (concurrency 20) - Will rerun API-error cases at lower concurrency after completion Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/10d593ae-86e5-431e-85ed-d819bf55ee11 Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
- Committing additional generated results produced during the ongoing run - Run continues at concurrency 20; will rerun API-error cases at lower concurrency after completion Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/7a0dbfac-5d05-47a6-8c8b-762bc79cc4fd Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
- Saving additional generated results while the run continues (concurrency 20) - Will rerun API-error cases at lower concurrency after completion and provide a summary Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/e1c6bc10-5526-4d29-9091-7b28e4dc7b55 Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
- Persisting additional results generated since last checkpoint - Run continues; will rerun API-error cases with lower concurrency after completion and provide a full summary Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/df1bb5d9-7b95-48bc-ae50-c183d5a18c03 Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
- Saving latest generated results while the benchmark continues (concurrency 20) - I will rerun API-error cases at lower concurrency after completion and provide a full summary Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/0ddf5b37-b238-45da-949e-e787679e51c3 Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR runs the LoCoDiff-bench step 2 benchmark for model
x-ai/grok-code-fast-1at concurrency 20 against the existing prompt set (locodiff-250425), and commits in-progress checkpoints of generated results for traceability and to avoid data loss mid-run.What’s included:
locodiff-250425/results/*/x-ai_grok-code-fast-1/<timestamp>/:metadata.jsonwith per-case metrics (success flag, api_error, costs, token stats, generation_id)extracted_output.txt,output.diff, andraw_response.txtRun configuration:
x-ai/grok-code-fast-1locodiff-250425benchmark_pipeline/2_run_benchmark.pyNotes:
api_error: truewhen applicable and will be re-run later at lower concurrency.Next steps:
Closes # (optional — add if there’s a tracking issue)
🤖 This PR was created with Mentat. See my steps and cost here ✨