Normalization vs Denormalization
When to use each technique? Advantages, disadvantages and practical examples
When designing a database, one of the most important decisions is whether to normalize (reduce redundancy) or denormalize (improve read performance). Let's understand the difference and when to apply each technique.
Normalization
What is Normalization?
Normalization is the process of organizing data to eliminate redundancy and avoid anomalies when inserting, updating, or deleting data.
Example (normalized):
-- Table: users ┌────┬────────────┬─────────────┐ │ id │ name │ email │ ├────┼────────────┼─────────────┤ │ 1 │ Juan Pérez │ juan@mail.com│ └────┴────────────┴─────────────┘ -- Table: orders ┌────┬─────────┬────────────┐ │ id │ user_id │ product │ ├────┼─────────┼────────────┤ │ 1 │ 1 │ Laptop │ │ 2 │ 1 │ Mouse │ └────┴─────────┴────────────┘ -- User data appears only ONCE in users table
Denormalization
What is Denormalization?
Denormalization is the process of intentionally adding redundancy to a database to improve read performance. It's the opposite of normalization.
Example (denormalized):
-- Single table with redundant data ┌────┬────────────┬─────────────────┬────────────┐ │ id │ user_name │ user_email │ product │ ├────┼────────────┼─────────────────┼────────────┤ │ 1 │ Juan Pérez │ juan@mail.com │ Laptop │ │ 2 │ Juan Pérez │ juan@mail.com │ Mouse │ └────┴────────────┴─────────────────┴────────────┘ -- User data is REPEATED (Juan Pérez appears twice)
The Normal Forms
1NF - First Normal Form
Each cell must contain a single value (no lists or arrays).
❌ BEFORE: orders { products: "Laptop,Mouse,Keyboard" }
✅ AFTER: orders { product: "Laptop" }, orders { product: "Mouse" }
2NF - Second Normal Form
Must be in 1NF and all non-key columns depend on the entire primary key.
3NF - Third Normal Form
Must be in 2NF and no transitive dependencies (non-key columns depend only on the key).
BCNF - Boyce-Codd Normal Form
Stricter version of 3NF. Every determinant must be a candidate key.
Normalization vs Denormalization
| Aspect | Normalization | Denormalization |
|---|---|---|
| Storage space | Minimal | More space (redundancy) |
| Write performance | Fast | Slower (multiple updates) |
| Read performance | Slower (JOINs) | Very fast |
| Data integrity | High (no anomalies) | Lower (possible inconsistencies) |
| Complexity | More complex queries | Simple queries |
When to use each technique?
Use Normalization
- OLTP systems (many writes)
- When data integrity is critical
- Limited storage space
- Frequent updates to the same data
- When starting a new project (start normalized)
Use Denormalization
- OLAP systems (many reads, few writes)
- Data warehouses and reports
- When read performance is critical
- Caches and materialized views
- Mobile apps with local databases
Practical Example
-- SCENARIO: E-commerce with products and categories -- NORMALIZED (3NF) - Best for administration CREATE TABLE categories ( id INT PRIMARY KEY, name VARCHAR(100) ); CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(100), price DECIMAL(10,2), category_id INT REFERENCES categories(id) ); -- Query needs JOIN (slower but consistent) SELECT p.name, c.name FROM products p JOIN categories c ON p.category_id = c.id WHERE p.price > 100; -- DENORMALIZED - Best for fast display CREATE TABLE products_denormalized ( id INT PRIMARY KEY, name VARCHAR(100), price DECIMAL(10,2), category_name VARCHAR(100) -- redundant, but faster ); -- Query is simpler (no JOIN) SELECT name, category_name FROM products_denormalized WHERE price > 100; -- HYBRID APPROACH (often the best) -- Keep normalized for writes, add cache layer for reads
Best practice: Start normalized (3NF). Only denormalize when you have a proven performance problem.
Advantages of Normalization
- Reduces data redundancy
- Saves storage space
- Avoids update/insert/delete anomalies
- Ensures data consistency
Advantages of Denormalization
- Faster query performance
- Simpler queries (fewer JOINs)
- Easier to understand for analysts
- Better for reporting systems