Open
Conversation
Distinguish the models used in the executor and evaluator
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
…s/sysmobench/sysmobench_core'
- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem
* Add UIUC CS423 Fall 2025 Exams * Fix json parsing * Fix backtick * Fix backtick 2 * Update benchmarks/courseexam_bench/data/cs_423_operating_systems_design_fall_2025_midterm/MP1.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/courseexam_bench/data/cs_423_operating_systems_design_fall_2025_final/exam.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Xuan Feng <xfeng9209@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add ostep processes-shell lab * fix * Update benchmarks/courselab_bench/data/cs537-projects-spring-2019/processes_shell/task.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Tarek Elsayed <60650661+tareknaser@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
* added cmu15-213 data lab * docs(courselab): add note about infrastructure restrictions Signed-off-by: Tarek <tareknaser360@gmail.com> --------- Signed-off-by: Tarek <tareknaser360@gmail.com> Co-authored-by: Tarek <tareknaser360@gmail.com>
* add cs537 fall 2021 final exam * add institution * fix * add solutions * update metadata * add choice array * avoid extra restrictions on LLM output Signed-off-by: Tarek <tareknaser360@gmail.com> --------- Signed-off-by: Tarek <tareknaser360@gmail.com> Co-authored-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
[Bug-fix] Patching ArtEvalBench's Acto (SOSP'23) entry
Open
tareknaser
reviewed
Feb 3, 2026
Collaborator
tareknaser
left a comment
There was a problem hiding this comment.
Please add a reference solution script
Collaborator
There was a problem hiding this comment.
Is this file part of the task?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Added attack lab for CMU 15-213
Changes
added attack lab
Testing
Tested with qwen3-coder-plus agent
Checklist