A neural network-based solution to the classic game of Hangman using a Transformer architecture. This project trains a transformer model to predict missing letters in partially revealed words, achieving competitive performance across various word lengths.
This project implements a transformer-based approach to solve Hangman by:
- Training a transformer model on millions of masked word examples
- Using the model to predict the most likely letters for each position
- Implementing an intelligent guessing strategy that considers letter probabilities across all masked positions
The model uses a Transformer Encoder architecture with the following specifications:
- Model Size: 25.2M parameters
- Embedding Dimension: 512
- Number of Heads: 8
- Number of Layers: 4
- Vocabulary Size: 29 characters (a-z, underscore, period, hyphen)
- Max Sequence Length: 25 characters
- Character Embeddings: Each character is embedded into a 512-dimensional space
- Positional Encoding: Learned positional embeddings for sequence position
- Transformer Encoder: Multi-head self-attention layers for contextual understanding
- Output Layer: Linear projection to vocabulary size for character prediction
The training dataset is created by:
- Loading 224,377 words from the English dictionary
- Generating all possible masked combinations for each word (2^n combinations for n unique characters)
- Creating 50M training examples and 1.4M validation examples
- Converting to integer sequences for model training
- Training Examples: 50,000,000
- Validation Examples: 1,406,556
- Batch Size: 4,096
- Learning Rate: 6e-4 with cosine annealing scheduler
- Optimizer: AdamW with weight decay 0.1
- Loss Function: Cross-Entropy Loss
- Training Time: ~26 minutes on H100 GPU (1 epoch)
- Training Steps: 12,208 total steps
- Training Loss: 0.2624 (final epoch)
- Validation Loss: 0.6247 (final epoch)
- Validation Accuracy: 67.98% (final epoch)
- In-Sample Win Rate: 76.07%
- Out-of-Sample Win Rate: 70.40%
- Strong generalization: 76.07% in-sample vs 70.40% out-of-sample performance
- Consistent performance: Model maintains high win rates across different word sets
- Improved training: Lower training loss and higher win rates indicate better model convergence
- Training efficiency: Model trained in ~26 minutes on H100 GPU (1 epoch)
- Input: Partially revealed word with underscores for missing letters
- Model Prediction: Transformer outputs probability distributions for each position
- Ensemble Strategy: Averages probabilities across all masked positions
- Letter Selection: Chooses the most likely unguessed letter
- Iteration: Continues until word is solved or max guesses (6) reached
Word: "reperplex"
Masked: "_________"
Guesses: ['e', 'r', 'd', 'v', 'a', 't', 'l', 'p', 'x']
Result: SOLVED in 9 guesses
hangman.ipynb: Main training and evaluation notebookhangman.py: Game implementation and baseline solvermodel.pth: Trained transformer model (25.2M parameters)training_dictionary.txt: 224,377 English words for trainingtest_dictionary.txt: Test words for evaluationgenerate_dictionaries.py: Script to generate training datatest.py: Utility functions for testingtrain_dataset.pkl: Pre-processed training dataset (2.0GB)val_dataset.pkl: Pre-processed validation dataset (148MB)pyproject.toml: Project configuration and dependenciesuv.lock: Lock file for reproducible dependency management
- Python: >=3.12
- PyTorch: >=2.7.1
- NumPy: <2
- Matplotlib: >=3.10.3
- tqdm: >=4.67.1
- pandas: >=2.3.0
- Jupyter: >=1.1.1
This project uses uv for dependency management. To set up:
-
Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies:
uv sync
-
Activate the virtual environment:
source .venv/bin/activate
- Training: Run the
hangman.ipynbnotebook to train the model - Inference: Use the trained model for interactive hangman solving
- Evaluation: Test performance on custom word lists
The model demonstrates that transformer architectures can effectively learn character-level patterns in English words and apply this knowledge to solve word-guessing games with competitive performance.