Course Lab Benchmark: Add a Check for Reference Solution Script by tareknaser · Pull Request #115 · sys-intelligence/system-intelligence-benchmark

tareknaser · 2026-02-03T00:46:50Z

As discussed in today’s meeting, we now require a reference solution script for all labs. This PR is a draft and should not be merged yet since it will break CI because one task (cmu_15-213__data_lab) is missing a reference solution. PR #98 should be merged first to address this.

Distinguish the models used in the executor and evaluator

Signed-off-by: Tarek <tareknaser360@gmail.com>

…m changes

…s/sysmobench/sysmobench_core'

- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem

[Bug-fix] Patching ArtEvalBench's Acto (SOSP'23) entry

…reduce

[lab] Add CS537 Spring 2018 Project 4a

[lab] CS537 Spring 2018 Project 5

Copilot

Pull request overview

Adds an explicit requirement that every Course Lab task includes a reference solution script (sol.sh), and documents this requirement in the benchmark README.

Changes:

Add a schema test that asserts each task folder contains sol.sh
Update the benchmark README to list sol.sh as a required task artifact and update “Adding New Labs” instructions

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
benchmarks/courselab_bench/tests/test_data_schema.py	Adds a new test to enforce presence of `sol.sh` in every task directory.
benchmarks/courselab_bench/README.md	Documents `sol.sh` as a required part of each lab task and updates authoring instructions accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-03T01:43:29Z

benchmarks/courselab_bench/tests/test_data_schema.py

+
+        for task_folder in task_folders:
+            sol_path = task_folder / "sol.sh"
+            assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution"


sol.sh is later executed/read as a script; checking only exists() can pass for a directory, broken symlink, or an empty placeholder file. Consider asserting sol_path.is_file() and that it has non-empty content (and optionally that it’s readable) to avoid false positives.

Suggested change

assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution"

assert sol_path.is_file(), f"{task_folder.name}: sol.sh must exist and be a regular file"

assert sol_path.stat().st_size > 0, f"{task_folder.name}: sol.sh must not be empty"

Copilot · 2026-02-03T01:43:29Z

benchmarks/courselab_bench/tests/test_data_schema.py

+    def test_reference_solution_exists(self):
+        task_folders = get_task_folders(DATA_DIR)
+
+        for task_folder in task_folders:
+            sol_path = task_folder / "sol.sh"
+            assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution"
+



This requirement overlaps with test_required_files_exist (which maintains the list of required task files). To keep a single source of truth and avoid re-traversing all task folders, add sol.sh to the existing required_files list and drop this separate test.

Suggested change

def test_reference_solution_exists(self):

task_folders = get_task_folders(DATA_DIR)

for task_folder in task_folders:

sol_path = task_folder / "sol.sh"

assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution"

That's actually a very good point 😃

xuafeng and others added 30 commits November 5, 2025 18:10

Rename it "System Intelligence Benchmark"

6d24e69

Init: Initialize SysMoBench benchmark integration

87db1e9

feat: Add gitigore

69f4cb5

feat: Add prototype for phase 1&2

843f031

feat: Distinguish evaluator and model API keys in env.toml

0d2b38f

feat: Add validation for required evaluator API keys

b2acaa7

doc: update README.md

ca7e72e

initial ArtEval commit

ec7b57f

Merge pull request #2 from systemintelligence/feat/distinguish-api-keys

a607e73

Distinguish the models used in the executor and evaluator

feat: Add test

60d30e0

featr: Add install.sh

ff96313

adding overview and contributor's guide

1799370

skeleton ArtEval agent implementation

2054314

adding sosp24 wasabi

6303aa5

docs: add arteval to main README

a5358dc

Signed-off-by: Tarek <tareknaser360@gmail.com>

feat(ci): add GH Actions workflow for running benchmarks tests

904374e

Signed-off-by: Tarek <tareknaser360@gmail.com>

feat: add issue and pull request templates

40ccf1f

Signed-off-by: Tarek <tareknaser360@gmail.com>

fix(ci): add a test for example_bench

4130c7a

Signed-off-by: Tarek <tareknaser360@gmail.com>

fix: shell scripts to be executable

3af5b70

Signed-off-by: Tarek <tareknaser360@gmail.com>

docs: update README with instructions for running a single benchmark

156c77c

Signed-off-by: Tarek <tareknaser360@gmail.com>

docs: a note on docker image arch support

a0557f9

Signed-off-by: Tarek <tareknaser360@gmail.com>

meta: add outputs directories to gitignore

868da59

Signed-off-by: Tarek <tareknaser360@gmail.com>

feat(ci): add release trigger to workflow

ea9b54d

Signed-off-by: Tarek <tareknaser360@gmail.com>

fix: Use tla_specification instead of generated_text to adapt upstrea…

5ef835f

…m changes

Merge commit '04900168e10834f3aa5eef4d13b318e1efcdac24' as 'benchmark…

c5dfbb1

…s/sysmobench/sysmobench_core'

fix: Add gpt-4o config and fix cross-device link issue in setup_tools

a68e171

- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem

fix: Convert GenerationOutput to GenerationResult for evaluators

984336a

docs: Update README and install script for Git Subtree integration

68025ca

feat: Add docker file

25c6af8

fix: Add env.toml

97c1c3c

qimcis and others added 24 commits January 26, 2026 15:06

expand mapreduce tests

59847be

sync mapreduce task with upstream README

d997dcd

feat(courselab): add solution validation using a reference script

0ce3ef8

feat(courselab): add reference solutions for existing labs

5b08284

feat(courselab): add more verbose tags

1ef35f4

feat(courselab): specify git commit when cloning

fe752b8

feat(courselab): specify go binary path in task description

1fff943

docs(courselab): add a section in the README for best practices

40b1061

docs: mention the leaderboard for courseexam and courselab benchmarks

c3801f9

Add CS537 Spring 2018 filesystem checker lab

b3ab078

Fix mkfs build for fs checker setup

ad33b40

Normalize fs checker output whitespace

ea9da86

expand filesystem checker tests

0d534be

add remaining filesystem checker error cases

bcc1e52

sync filesystem checker task with upstream README

0ae5d11

feat(courselab): add ref solution to cs537

155ace1

Merge branch 'sys-intelligence:main' into main

8b05322

Merge pull request #45 from bastoica/main

29f114d

[Bug-fix] Patching ArtEvalBench's Acto (SOSP'23) entry

Merge remote-tracking branch 'origin/main' into cs537-spring-2018-map…

fd14034

…reduce

Add reference solution for CS537 mapreduce

757fb81

Merge pull request #94 from qimcis/cs537-spring-2018-mapreduce

4f95253

[lab] Add CS537 Spring 2018 Project 4a

Merge pull request #96 from qimcis/cs537-spring-2018-fs-checker

ee57852

[lab] CS537 Spring 2018 Project 5

docs: update PR template to include title convention guidelines

6b6c809

test(courselab): add check for reference solution script

e6af9f8

xuafeng requested a review from Copilot February 3, 2026 01:40

Copilot started reviewing on behalf of xuafeng February 3, 2026 01:40 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

tareknaser force-pushed the main branch from 57b962d to a1780ed Compare February 5, 2026 16:46

tareknaser closed this Feb 5, 2026

tareknaser deleted the courselab_ref_sol branch February 5, 2026 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Course Lab Benchmark: Add a Check for Reference Solution Script#115

Course Lab Benchmark: Add a Check for Reference Solution Script#115
tareknaser wants to merge 216 commits intomainfrom
courselab_ref_sol

tareknaser commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

tareknaser Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

	assert sol_path.exists(), f"{task_folder.name} missing sol.sh reference solution"
	assert sol_path.is_file(), f"{task_folder.name}: sol.sh must exist and be a regular file"
	assert sol_path.stat().st_size > 0, f"{task_folder.name}: sol.sh must not be empty"

Conversation

tareknaser commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

tareknaser Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants