Skip to content

Conversation

@AdnanQureshi3
Copy link
Contributor

#470
Solved the issue: Implement a New Workflow
feat: add SudokuWorkflow example implementation and register in default_mapping

Checklist

Added sudoku_workflow.py with full docstrings and inline comments

  • Implemented simple single-turn Sudoku solving workflow
  • Added reward calculation based on exact match
  • Registered workflow in trinity.common.workflows.init in alphabetical order
  • Ensured code follows style and passes pre-commit checks

If need to implement more custom workflows then please let me know

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. The Sudoku has some similarities to frozen lake.
And the current version has significant room for improvement.

A qualified Sudoku workflow should include three parts:

1.A Sudoku generator: Automatically generate solvable Sudoku puzzles and allow you to set the difficulty level.
2. An agentic workflow to solve the Sudoku: Some Sudoku is hard to solve in just one step, so an agentic workflow should be designed to solve the game in multiple steps.
3. A general judge function: Some Sudoku puzzles may have multiple possible solutions, the judge function should correctly parse the model's output and determine the correctness of the result according to the Sudoku rules, not just exactly match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pan-x-c ,
Thanks for the detailed feedback earlier.

I’ve implemented all the requested changes:

  • Added a SudokuGenerator that produces solvable puzzles (with adjustable difficulty via hole count)
  • Reworked the workflow into a multi-step agentic loop, similar in structure to FrozenLakeWorkflow
  • Added a SudokuJudge that validates rows, columns, and 3×3 blocks instead of exact string matching
  • Integrated generator + judge inside the workflow
  • Updated workflow registry

Please have a look and let me know if you’d like further improvements or additional refinements.

- Removes 'holes' positions to create a puzzle
"""

BASE_SOLUTION = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on a single standard answer to generate Sudoku puzzles can easily lead to overfitting. Existing works (e.g., python-sudoku-generator-solver) can be referenced for the generation and evaluation parts.


for step in range(self.max_steps):
prompt = f"""
Solve Sudoku by giving moves one at a time.
Copy link
Collaborator

@pan-x-c pan-x-c Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompts are important for agentic workflow. They should precisely describe the game rules and the tasks required to do at each step, as well as the output format. In some cases, even a few-shot example may be necessary. The design of prompts can also draw some inspiration from the Frozen Lake example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve updated the Sudoku workflow by improving the generator to avoid a single canonical solution and refining the prompt to clearly describe the rules, step-wise task, and strict output format, inspired by the Frozen Lake example.

Please let me know if any further refinements are needed.

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 19, 2026

Sorry for the late reply. The current workflow code structure basically meets the requirements, but there is still some room for improvement in the details. For example:

  1. Currently, self.board is directly represented as an array, rather than using a string format similar to the frozen lake render for the prompt. This may affect the model's understanding.
  2. The generate function can be further improved. Although the difficulty (number of empty cells) can be adjusted, all questions are generated based on a single standard answer, which may lead to overfitting.
  3. The current design fills only one cell per step. While this is simple, considering that Sudoku has many empty cells, it may result in too many interaction rounds, overly long context, and increased training costs. You might consider allowing the model to fill in multiple numbers at a time. Of course, this is just a personal suggestion, and the actual effect needs to be verified in practice.

If resources permit, I recommend running the workflow locally in debug mode to observe:

  1. Whether the workflow can complete the game without errors (regardless of correctness).
  2. Whether it can solve the Sudoku correctly with a certain probability (if all answers are wrong, RL training cannot proceed).

Additionally, since this example is relatively complex, I suggest converting some samples into unit tests to ensure the correctness of each module.

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 19, 2026

If you find the 9x9 setting is too difficult, you can try 4x4 or 6x6 setting instead.

This Leaderboard may help you to build the workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants