Welcome to Week 2 of the LLM Engineering & Deployment Certification Program by Ready Tensor.
This week focuses on the foundational concepts of LLM fine-tuning, covering everything from next-token prediction to dataset preparation, tokenization, and parameter-efficient training techniques.
This repository contains code examples, demonstrations, and exercises for the following lessons:
Understanding Next-Token Prediction
- How LLMs work as massive classifiers predicting the next token
- Understanding probability distributions over vocabulary
- The autoregressive loop: how single predictions become full responses
- Why models produce different outputs with the same prompt
Loss, Masking, and Next-Token Prediction
- The learning loop: prediction → loss → update
- Cross-entropy loss: measuring prediction quality
- From single-token to sequence-level loss calculation
- Causal masking: ensuring left-to-right prediction
- Selective scoring: controlling which tokens contribute to learning
Core Concepts for Customizing Large Language Models
- What supervised fine-tuning (SFT) actually is
- How SFT differs from pretraining (and why it's still the same mechanism)
- The three-stage LLM pipeline: pretraining → SFT → preference optimization
- Roadmap of foundational concepts needed before fine-tuning with LoRA/QLoRA
- Why understanding these foundations transforms trial-and-error into engineering
Formats and Best Practices for LLM Fine-Tuning
- Understanding dataset sources: human-labeled, synthetic, and hybrid approaches
- Dataset formats: instruction, conversation (chat), and preference structures
- Creating datasets with LLM-assisted pipelines (e.g., using Distilabel)
- Validating and cleaning data before training
- Loading, exploring, and publishing datasets with Hugging Face
Preparing Text Data for LLM Training
- How tokenization converts text into subword units
- Comparing tokenizers: why different models tokenize text differently
- Special tokens: BOS, EOS, PAD, UNK, and chat-specific markers
- Padding strategies: making variable-length sequences uniform for batching
- Attention masks: telling the model which tokens are real vs. padding
- Chat templates: formatting conversations for instruct models
Assistant-Only Masking Explained
- The selective learning challenge: training only on assistant responses
- How assistant-only masking works with
-100labels in PyTorch - Multi-turn conversations: masking user and system messages
- Implementing masking in practice with chat templates
- Debugging common masking issues (echoing inputs, loss not decreasing)
FP32, FP16, BF16, INT8, INT4 Explained
- Understanding floating-point formats: sign, exponent, and mantissa
- FP32 (full precision), FP16 (half precision), BF16 (brain float)
- Why BF16 is the modern training standard (same range as FP32, half the memory)
- Quantization: how INT8 and INT4 compress models for inference
- Calculating model memory requirements across different data types
- When to use each format: training vs. fine-tuning vs. inference
- The accessibility problem: why full fine-tuning is impractical for large models
- LoRA: low-rank adaptation using frozen weights + small trainable matrices
- Understanding LoRA hyperparameters: rank (r), alpha (α), and target modules
- QLoRA: adding 4-bit quantization (NF4, double quantization, paged optimizers)
- When to use LoRA vs. QLoRA based on your GPU memory
- Implementation with Hugging Face PEFT and bitsandbytes
- Best practices and common pitfalls in PEFT workflows
rt-llm-eng-cert-week2/
├── code/
│ ├── lesson1/ # Next-token prediction demos
│ ├── lesson2/ # Loss and masking examples
│ ├── lesson3/ # Dataset creation scripts
│ ├── lesson4/ # Assistant-only masking
│ ├── lesson5/ # Data types demonstrations
│ ├── lesson6/ # Hugging Face dataset workflows
│ └── lesson8/ # LoRA/QLoRA examples
├── lessons/ # Lesson materials and markdown files
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.8 or higher
- Basic understanding of Python and machine learning concepts
- Familiarity with PyTorch and Hugging Face libraries
-
Clone this repository:
git clone https://github.com/your-username/rt-llm-eng-cert-week2.git cd rt-llm-eng-cert-week2 -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
Navigate to any lesson directory and open the Jupyter notebooks:
cd code/lesson1
jupyter notebookOr run Python scripts directly:
python code/lesson3/create_dataset.pyEach lesson contains interactive Jupyter notebooks demonstrating key concepts:
- Lesson 1: Classification and autoregressive generation visualizations
- Lesson 2: Cross-entropy loss, label shifting, and masking demonstrations
- Lesson 3: Tokenization comparisons across different models
- Lesson 4: Dataset exploration and manipulation
- Lesson 5: Padding and attention mask examples
- Lesson 6: Assistant-only masking implementation, pushing datasets to Hugging Face
- Lesson 7: Data type memory calculations and precision trade-offs
- Lesson 8: LoRA/QLoRA implementation examples
This week's materials use the following libraries and tools:
- Transformers - Hugging Face's model library
- Datasets - Dataset loading and processing
- PyTorch - Deep learning framework
- PEFT - Parameter-Efficient Fine-Tuning
- bitsandbytes - 8-bit and 4-bit quantization
- tiktoken - OpenAI's tokenizer
We recommend following the lessons in order, as each builds on concepts from previous lessons:
- Start with Lesson 1 to understand how LLMs generate text
- Progress through Lessons 2-3 to learn the training fundamentals
- Work through Lessons 4-6 for practical dataset preparation and formatting
- Complete Lessons 7-8 to learn optimization techniques for efficient fine-tuning
By the end of Week 2, you will be able to:
✅ Explain how LLMs perform next-token prediction
✅ Calculate and interpret cross-entropy loss for language models
✅ Prepare and format datasets for instruction fine-tuning
✅ Compare tokenizers and understand their impact on training
✅ Apply assistant-only masking for chat-based models
✅ Calculate memory requirements for different data types
✅ Implement LoRA and QLoRA for parameter-efficient fine-tuning
- Program Homepage: LLM Engineering & Deployment Certification
- Hugging Face Documentation: https://huggingface.co/docs
- PyTorch Tutorials: https://pytorch.org/tutorials/
- Ready Tensor Platform: https://app.readytensor.ai