Transformer Hangman Solver

A neural network-based solution to the classic game of Hangman using a Transformer architecture. This project trains a transformer model to predict missing letters in partially revealed words, achieving competitive performance across various word lengths.

Overview

This project implements a transformer-based approach to solve Hangman by:

Training a transformer model on millions of masked word examples
Using the model to predict the most likely letters for each position
Implementing an intelligent guessing strategy that considers letter probabilities across all masked positions

Architecture

The model uses a Transformer Encoder architecture with the following specifications:

Model Size: 25.2M parameters
Embedding Dimension: 512
Number of Heads: 8
Number of Layers: 4
Vocabulary Size: 29 characters (a-z, underscore, period, hyphen)
Max Sequence Length: 25 characters

Key Components

Character Embeddings: Each character is embedded into a 512-dimensional space
Positional Encoding: Learned positional embeddings for sequence position
Transformer Encoder: Multi-head self-attention layers for contextual understanding
Output Layer: Linear projection to vocabulary size for character prediction

Training Process

Dataset Generation

The training dataset is created by:

Loading 224,377 words from the English dictionary
Generating all possible masked combinations for each word (2^n combinations for n unique characters)
Creating 50M training examples and 1.4M validation examples
Converting to integer sequences for model training

Training Details

Training Examples: 50,000,000
Validation Examples: 1,406,556
Batch Size: 4,096
Learning Rate: 6e-4 with cosine annealing scheduler
Optimizer: AdamW with weight decay 0.1
Loss Function: Cross-Entropy Loss
Training Time: ~26 minutes on H100 GPU (1 epoch)
Training Steps: 12,208 total steps

Performance Metrics

Overall Performance

Training Loss: 0.2624 (final epoch)
Validation Loss: 0.6247 (final epoch)
Validation Accuracy: 67.98% (final epoch)
In-Sample Win Rate: 76.07%
Out-of-Sample Win Rate: 70.40%

Key Observations

Strong generalization: 76.07% in-sample vs 70.40% out-of-sample performance
Consistent performance: Model maintains high win rates across different word sets
Improved training: Lower training loss and higher win rates indicate better model convergence
Training efficiency: Model trained in ~26 minutes on H100 GPU (1 epoch)

How It Works

Inference Process

Input: Partially revealed word with underscores for missing letters
Model Prediction: Transformer outputs probability distributions for each position
Ensemble Strategy: Averages probabilities across all masked positions
Letter Selection: Chooses the most likely unguessed letter
Iteration: Continues until word is solved or max guesses (6) reached

Example Game

Word: "reperplex"
Masked: "_________"
Guesses: ['e', 'r', 'd', 'v', 'a', 't', 'l', 'p', 'x']
Result: SOLVED in 9 guesses

Files

hangman.ipynb: Main training and evaluation notebook
hangman.py: Game implementation and baseline solver
model.pth: Trained transformer model (25.2M parameters)
training_dictionary.txt: 224,377 English words for training
test_dictionary.txt: Test words for evaluation
generate_dictionaries.py: Script to generate training data
test.py: Utility functions for testing
train_dataset.pkl: Pre-processed training dataset (2.0GB)
val_dataset.pkl: Pre-processed validation dataset (148MB)
pyproject.toml: Project configuration and dependencies
uv.lock: Lock file for reproducible dependency management

Requirements

Python: >=3.12
PyTorch: >=2.7.1
NumPy: <2
Matplotlib: >=3.10.3
tqdm: >=4.67.1
pandas: >=2.3.0
Jupyter: >=1.1.1

Setup and Usage

Installation

This project uses uv for dependency management. To set up:

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dependencies:
```
uv sync
```
Activate the virtual environment:
```
source .venv/bin/activate
```

Usage

Training: Run the hangman.ipynb notebook to train the model
Inference: Use the trained model for interactive hangman solving
Evaluation: Test performance on custom word lists

The model demonstrates that transformer architectures can effectively learn character-level patterns in English words and apply this knowledge to solve word-guessing games with competitive performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer Hangman Solver

Overview

Architecture

Key Components

Training Process

Dataset Generation

Training Details

Performance Metrics

Overall Performance

Key Observations

How It Works

Inference Process

Example Game

Files

Requirements

Setup and Usage

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
generate_dictionaries.py		generate_dictionaries.py
hangman.ipynb		hangman.ipynb
hangman.py		hangman.py
model.pth		model.pth
pyproject.toml		pyproject.toml
test.py		test.py
test_dictionary.txt		test_dictionary.txt
training_dictionary.txt		training_dictionary.txt
uv.lock		uv.lock
words.txt		words.txt

cweill/transformer-hangman-solver

Folders and files

Latest commit

History

Repository files navigation

Transformer Hangman Solver

Overview

Architecture

Key Components

Training Process

Dataset Generation

Training Details

Performance Metrics

Overall Performance

Key Observations

How It Works

Inference Process

Example Game

Files

Requirements

Setup and Usage

Installation

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages