A deep learning project for automatic image colorization — converting grayscale images to color using multiple neural network architectures, trained and evaluated on Kaggle.
- What This Project Does
- Project Structure
- Setup & Installation
- Dataset
- Running the Notebooks
- Models
- Data Augmentation
- Results
- Saved Models
Given a grayscale image as input, the models learn to predict and output a plausible colorized (RGB) version of that image. Three different architectures were explored and compared:
| Notebook | Architecture | Approach |
|---|---|---|
unet-model.ipynb |
U-Net | Encoder-decoder with skip connections |
ResNet.ipynb |
ResNet-34 Autoencoder | Pretrained ResNet-34 encoder + custom decoder |
base-gan-pix2pix.ipynb |
Pix2Pix GAN | FCN-ResNet50 generator + CNN discriminator |
pix2pix-without-finetune.ipynb |
Pix2Pix (no fine-tune) | Pix2Pix without pretrained weights |
resnet-without-finetuning.ipynb |
ResNet (no fine-tune) | ResNet autoencoder without pretrained weights |
dataloader_code.ipynb |
— | Shared dataset/dataloader utilities |
colorify/
├── unet-model.ipynb ← U-Net colorization
├── ResNet.ipynb ← ResNet-34 autoencoder colorization
├── base-gan-pix2pix.ipynb ← Pix2Pix GAN (with pretrained weights)
├── pix2pix-without-finetune.ipynb ← Pix2Pix GAN (no pretrained weights)
├── resnet-without-finetuning.ipynb← ResNet autoencoder (no pretrained weights)
└── dataloader_code.ipynb ← Shared dataloader utilities
- Python 3.8+
- CUDA-capable GPU (strongly recommended; CPU training will be very slow)
git clone https://github.com/<your-username>/colorify.git
cd colorifypip install torch torchvision segmentation-models-pytorch Pillow matplotlib tqdmNote: For a specific CUDA version of PyTorch, visit pytorch.org and use the appropriate install command, e.g.:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install notebookThe project uses a custom dataset (genai-colorify-dataset-version-1) hosted on Kaggle.
Dataset structure expected:
dataset/
├── train/
│ ├── gray/ ← grayscale input images
│ └── color/ ← ground-truth RGB images
└── test/
├── gray/
└── color/
Each grayscale image must have a corresponding color image with the same filename.
To download the dataset from Kaggle:
- Install the Kaggle CLI:
pip install kaggle - Place your
kaggle.jsonAPI token in~/.kaggle/ - Run:
kaggle datasets download -d <dataset-slug>
unzip <dataset-slug>.zip -d dataset/All notebooks are designed to run on Kaggle with GPU acceleration — no local setup needed.
- Go to kaggle.com and create an account
- Upload or fork the notebook
- Attach the dataset
genai-colorify-dataset-version-1under Add Data - Enable GPU: Settings → Accelerator → GPU
- Click Run All
Dataset paths in the notebooks are already set to /kaggle/input/genai-colorify-dataset-version-1/.
- Complete the Setup & Installation steps above
- Download the dataset and place it somewhere on your machine
- Open a notebook:
jupyter notebook ResNet.ipynb- Update the dataset paths in the dataloader cell. Find lines like:
root_dir="/kaggle/input/genai-colorify-dataset-version-1/train"and change them to your local path, e.g.:
root_dir="./dataset/train"- Run all cells top to bottom
Tip: Start with
ResNet.ipynb— it's the simplest model and trains fastest.
- Classic encoder-decoder architecture with skip connections
- Input:
[B, 1, 256, 256]grayscale → Output:[B, 3, 256, 256]RGB - Uses
segmentation-models-pytorchlibrary
- Encoder: Pretrained ResNet-34 (all layers up to the last conv block)
- Decoder: Custom upsampling layers (bilinear interpolation + Conv2d)
- Loss: MSE (L2) | Optimizer: Adam (lr=1e-4) | Scheduler: StepLR
- 30 epochs, batch size 128
- Generator: FCN-ResNet50 (pretrained, modified for 1-channel input and 3-channel output)
- Discriminator: Simple CNN with LeakyReLU activations
- Loss: Adversarial (BCE) + Pixel-wise (L1)
- Optimizer: Adam (lr=0.0002, β=(0.5, 0.999))
- 20 epochs, batch size 16
resnet-without-finetuning.ipynb— same as ResNet-34 but with randomly initialized weightspix2pix-without-finetune.ipynb— same as Pix2Pix but with randomly initialized weights
Applied consistently across all models:
Grayscale transforms:
RandomResizedCrop(256×256, scale=(0.8, 1.0))RandomHorizontalFlip(p=0.5)RandomRotation(10°)Normalize(mean=0.5, std=0.5)
Color transforms (same as above, plus):
ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)Normalize(mean=(0.5,0.5,0.5), std=(0.5,0.5,0.5))
| Model | Epochs | Final Loss |
|---|---|---|
| ResNet-34 Autoencoder | 30 | ~0.163 (MSE) |
| Pix2Pix GAN | 20 | G: ~100, D: ~0.03 |
Loss curves are plotted at the end of each notebook.
Each notebook saves the trained model weights after training:
| Notebook | Saved File(s) |
|---|---|
ResNet.ipynb |
vgg16_colorization.pth |
base-gan-pix2pix.ipynb |
Pix2Pix_Generator.pth, Pix2Pix_Discriminator.pth |