A complete educational implementation of GPT (Generative Pre-trained Transformer) in pure Python
Features β’ Quick Start β’ Documentation β’ Examples β’ Study Guide
This repository contains three progressive versions of a GPT model implementation, each designed for different learning needs:
| Version | Lines | Description | Best For |
|---|---|---|---|
| Original | 243 | Andrej Karpathyβs minimal implementation | Quick reference |
| Refactored | 850 | Well-structured with Mermaid diagrams | Understanding architecture |
| Educational | 1,200 | Professor-style teaching with detailed prints | Learning from scratch |
All versions maintain 100% functional equivalence while progressively improving readability and educational value.
By studying this code, you will understand:
# Clone the repository
git clone https://github.com/andresveraf/Build-GPT-model-with-Python.git
cd Build-GPT-model-with-Python
# No additional dependencies needed! (Pure Python)
python3 script_gpt_educational.py
Expected Output:
================================================================================
DEEP LEARNING FROM SCRATCH: GPT Implementation
================================================================================
π Welcome! Let's build a GPT model step by step, understanding every detail.
β Random seed set to 42 for reproducibility
================================================================================
PART 2: CONFIGURING THE MODEL - HYPERPARAMETERS
================================================================================
π MODEL ARCHITECTURE:
β’ Embedding dimension: 16
β’ Attention heads: 4 (each with 4 dimensions)
β’ Transformer layers: 1
β’ Context window: 16 tokens
...
π² GENERATING 20 SAMPLES:
Sample 1: kamon
Sample 2: ann
Sample 3: karai
Sample 4: jaire
Sample 5: vialan
...
# Refactored version (clean, documented)
python3 script_gpt_refactored.py
# Original version (compact)
python3 script_gpt.py
REFACTORING_SUMMARY.md - Complete study guide with:
This README - Quick start and overview
Inline Documentation - Each Python file contains extensive comments
Each component is thoroughly explained:
# Multi-head attention allows the model to focus on different aspects
# Head 0: Previous character dependency
# Head 1: Position-based patterns
# Head 2: Consonant clusters
# Head 3: Vowel patterns
Step 1 / 1000 | Loss: 3.3660 | Perplexity: 28.94
Step 100 / 1000 | Loss: 2.8945 | Perplexity: 18.07
Step 200 / 1000 | Loss: 2.7123 | Perplexity: 15.07
Step 500 / 1000 | Loss: 2.6543 | Perplexity: 14.22
Step 1000/ 1000 | Loss: 2.6501 | Perplexity: 14.16
After training on 32,033 names, the model generates realistic names:
kamon, ann, karai, jaire, vialan, mari, jalen, etc.
Level 1: Beginner (1-2 days)
Level 2: Intermediate (1 week)
Level 3: Advanced (2-3 weeks)
Level 4: Expert (ongoing)
| Concept | Importance | Difficulty |
|---|---|---|
| Tokenization | βββ | Easy |
| Embeddings | ββββ | Medium |
| Attention | βββββ | Hard |
| Normalization | ββββ | Medium |
| Backpropagation | βββββ | Hard |
| Optimization | ββββ | Medium |
Build-GPT-model-with-Python/
β
βββ README.md # This file
βββ REFACTORING_SUMMARY.md # Complete study guide
βββ input.txt # Training data (names)
β
βββ script_gpt.py # Original (243 lines)
βββ script_gpt_refactored.py # Refactored (850 lines)
βββ script_gpt_educational.py # Educational (1,200 lines)
# In script_gpt_educational.py, modify:
N_EMBD = 16 # Try: 32, 64, 128
N_HEAD = 4 # Try: 2, 8
N_LAYER = 1 # Try: 2, 3, 4
BLOCK_SIZE = 16 # Try: 32, 64
LEARNING_RATE = 0.01 # Try: 0.001, 0.005, 0.02
NUM_STEPS = 1000 # Try: 500, 2000, 5000
TEMPERATURE = 0.5 # Try: 0.3 (conservative), 0.8 (creative)
Replace input.txt with your own text file (one item per line):
word1
word2
word3
...
Input Token
β
βββββββββββββββββββββββββββββββββββββββ
β Embedding Layer β
β β’ Token Embedding β
β β’ Position Embedding β
β β’ RMS Normalization β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Transformer Layer β
β β’ Multi-Head Self-Attention β
β β’ Residual Connection β
β β’ Feed-Forward Network (MLP) β
β β’ Residual Connection β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Output Projection β
β β’ Linear to Vocabulary Size β
βββββββββββββββββββββββββββββββββββββββ
β
Logits β Softmax β Probabilities
This is an educational project. Contributions are welcome!
# Run tests (if you add them)
python3 -m pytest tests/
# Format code
black script_gpt_*.py
# Check style
flake8 script_gpt_*.py
This project is open source and available under the MIT License.
Have questions? Feel free to:
Made with β€οΈ for educational purposes