Course Lab Benchmark: Add a Check for Reference Solution Script#115
Course Lab Benchmark: Add a Check for Reference Solution Script#115tareknaser wants to merge 216 commits intomainfrom
Conversation
Distinguish the models used in the executor and evaluator
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
…s/sysmobench/sysmobench_core'
- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem
[Bug-fix] Patching ArtEvalBench's Acto (SOSP'23) entry
[lab] Add CS537 Spring 2018 Project 4a
[lab] CS537 Spring 2018 Project 5
There was a problem hiding this comment.
Pull request overview
Adds an explicit requirement that every Course Lab task includes a reference solution script (sol.sh), and documents this requirement in the benchmark README.
Changes:
- Add a schema test that asserts each task folder contains
sol.sh - Update the benchmark README to list
sol.shas a required task artifact and update “Adding New Labs” instructions
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| benchmarks/courselab_bench/tests/test_data_schema.py | Adds a new test to enforce presence of sol.sh in every task directory. |
| benchmarks/courselab_bench/README.md | Documents sol.sh as a required part of each lab task and updates authoring instructions accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| for task_folder in task_folders: | ||
| sol_path = task_folder / "sol.sh" | ||
| assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution" |
There was a problem hiding this comment.
sol.sh is later executed/read as a script; checking only exists() can pass for a directory, broken symlink, or an empty placeholder file. Consider asserting sol_path.is_file() and that it has non-empty content (and optionally that it’s readable) to avoid false positives.
| assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution" | |
| assert sol_path.is_file(), f"{task_folder.name}: sol.sh must exist and be a regular file" | |
| assert sol_path.stat().st_size > 0, f"{task_folder.name}: sol.sh must not be empty" |
| def test_reference_solution_exists(self): | ||
| task_folders = get_task_folders(DATA_DIR) | ||
|
|
||
| for task_folder in task_folders: | ||
| sol_path = task_folder / "sol.sh" | ||
| assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution" | ||
|
|
||
|
|
There was a problem hiding this comment.
This requirement overlaps with test_required_files_exist (which maintains the list of required task files). To keep a single source of truth and avoid re-traversing all task folders, add sol.sh to the existing required_files list and drop this separate test.
| def test_reference_solution_exists(self): | |
| task_folders = get_task_folders(DATA_DIR) | |
| for task_folder in task_folders: | |
| sol_path = task_folder / "sol.sh" | |
| assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution" |
There was a problem hiding this comment.
That's actually a very good point 😃
As discussed in today’s meeting, we now require a reference solution script for all labs. This PR is a draft and should not be merged yet since it will break CI because one task (
cmu_15-213__data_lab) is missing a reference solution. PR #98 should be merged first to address this.