| title | code-debug-env |
|---|---|
| emoji | 🧪 |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
An OpenEnv-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.
| Property | Value |
|---|---|
| Domain | Real-world Python code debugging |
| Tasks | 45 total (15 easy + 15 medium + 15 hard) |
| Difficulties | easy → medium → hard |
| Reward Range | 0.0 – 1.0 (partial, proportional) |
| Max Steps/Episode | 3 |
| API | OpenEnv standard: /reset, /step, /state |
The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.
- Easy: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
- Medium: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
- Hard: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 × test score + 0.3 × explanation quality.
{
"fixed_code": "string — the corrected Python function (required)",
"explanation": "string — explanation of what was wrong (required for hard tasks)"
}| Field | Type | Required | Description |
|---|---|---|---|
fixed_code |
str |
Always | Complete corrected Python function as a string |
explanation |
str |
Hard tasks | Describe the bug and why your fix is correct |
Returned by /reset and /step:
{
"task_id": "easy_003",
"difficulty": "easy",
"buggy_code": "def find_max(nums):\n ...",
"instructions": "The function has exactly one bug. Fix it.",
"test_cases_description": "Finds max value in a list without IndexError",
"reward": 0.67,
"passed_tests": 2,
"total_tests": 3,
"feedback": "Test 1: ✅ ...\nTest 2: ✅ ...\nTest 3: ❌ ...",
"done": false
}| Field | Type | Description |
|---|---|---|
task_id |
str |
Unique task identifier |
difficulty |
str |
easy / medium / hard |
buggy_code |
str |
Buggy Python function to fix |
instructions |
str |
Task instructions |
test_cases_description |
str |
What the test cases check |
reward |
float|null |
Score from last step (null on reset) |
passed_tests |
int|null |
Tests passed (null on reset) |
total_tests |
int |
Total number of test cases |
feedback |
str|null |
Detailed per-test feedback |
done |
bool |
True when episode is complete |
reward = passed_tests / total_tests
- 3/3 tests → 1.0
- 2/3 tests → 0.67
- 1/3 tests → 0.33
- 0/3 tests → 0.0
reward = 0.7 × test_score + 0.3 × explanation_score
Explanation is scored by matching key algorithmic concepts. Partial credit is given.
- Python 3.10+
- Docker
- Hugging Face CLI
git clone https://github.com/YOUR_USERNAME/code-debug-env
cd code-debug-env
pip install -e .
# Also clone OpenEnv for PYTHONPATH
git clone https://github.com/meta-pytorch/OpenEnv.git
export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.uvicorn server.app:app --host 0.0.0.0 --port 7860 --reloaddocker build -f server/Dockerfile -t code-debug-env .
docker run -p 7860:7860 code-debug-env# Health check
curl http://localhost:7860/health
# Reset (easy task)
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"difficulty": "easy"}'
# Submit a fix
curl -X POST http://localhost:7860/step \
-H "Content-Type: application/json" \
-d '{"fixed_code": "def find_max(nums):\n return max(nums)"}'
# Check state
curl http://localhost:7860/stateexport API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your-api-key"
# Run all 3 difficulties
python inference.py --url http://localhost:7860
# Run specific difficulty
python inference.py --url http://localhost:7860 --difficulty hardRun before submitting to catch any disqualifying issues:
# Start the environment first, then:
python validator/pre_submit_check.py --url http://localhost:7860
# Or against your HF Space:
python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space# Login
huggingface-cli login
# Create space and push
huggingface-cli repo create code-debug-env --type space --space_sdk docker
cd code-debug-env
git init
git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
git add .
git commit -m "Initial commit"
git push origin maincode-debug-env/
├── openenv.yaml ← OpenEnv manifest
├── inference.py ← Baseline agent (root, required)
├── pyproject.toml ← Dependencies
├── README.md
├── models.py ← Pydantic Action/Observation/State
├── client.py ← EnvClient for training loops
├── __init__.py
├── server/
│ ├── app.py ← FastAPI: /reset /step /state /health
│ ├── environment.py ← Core episode logic
│ ├── tasks/
│ │ ├── task_easy.py ← 15 single-bug tasks
│ │ ├── task_medium.py← 15 two-bug tasks
│ │ └── task_hard.py ← 15 algorithmic tasks
│ ├── graders/
│ │ ├── grader_easy.py
│ │ ├── grader_medium.py
│ │ └── grader_hard.py
│ ├── requirements.txt
│ └── Dockerfile
└── validator/
└── pre_submit_check.py