Blog Databases

Normalization vs Denormalization

When to use each technique? Advantages, disadvantages and practical examples

When designing a database, one of the most important decisions is whether to normalize (reduce redundancy) or denormalize (improve read performance). Let's understand the difference and when to apply each technique.

Normalization

1NF · 2NF · 3NF · BCNF

What is Normalization?

Normalization is the process of organizing data to eliminate redundancy and avoid anomalies when inserting, updating, or deleting data.

Example (normalized):

-- Table: users
┌────┬────────────┬─────────────┐
│ id │ name       │ email       │
├────┼────────────┼─────────────┤
│ 1  │ Juan Pérez │ juan@mail.com│
└────┴────────────┴─────────────┘

-- Table: orders
┌────┬─────────┬────────────┐
│ id │ user_id │ product    │
├────┼─────────┼────────────┤
│ 1  │ 1       │ Laptop     │
│ 2  │ 1       │ Mouse      │
└────┴─────────┴────────────┘

-- User data appears only ONCE in users table
Saves space
Avoids anomalies
Easier updates
More JOINs

Denormalization

Read-Optimized

What is Denormalization?

Denormalization is the process of intentionally adding redundancy to a database to improve read performance. It's the opposite of normalization.

Example (denormalized):

-- Single table with redundant data
┌────┬────────────┬─────────────────┬────────────┐
│ id │ user_name  │ user_email      │ product    │
├────┼────────────┼─────────────────┼────────────┤
│ 1  │ Juan Pérez │ juan@mail.com   │ Laptop     │
│ 2  │ Juan Pérez │ juan@mail.com   │ Mouse      │
└────┴────────────┴─────────────────┴────────────┘

-- User data is REPEATED (Juan Pérez appears twice)
Faster reads
Fewer JOINs
More space
Update anomalies

The Normal Forms

1NF - First Normal Form

Each cell must contain a single value (no lists or arrays).

❌ BEFORE: orders { products: "Laptop,Mouse,Keyboard" }
✅ AFTER:  orders { product: "Laptop" }, orders { product: "Mouse" }

2NF - Second Normal Form

Must be in 1NF and all non-key columns depend on the entire primary key.

3NF - Third Normal Form

Must be in 2NF and no transitive dependencies (non-key columns depend only on the key).

BCNF - Boyce-Codd Normal Form

Stricter version of 3NF. Every determinant must be a candidate key.

Normalization vs Denormalization

Aspect Normalization Denormalization
Storage space Minimal More space (redundancy)
Write performance Fast Slower (multiple updates)
Read performance Slower (JOINs) Very fast
Data integrity High (no anomalies) Lower (possible inconsistencies)
Complexity More complex queries Simple queries

When to use each technique?

Use Normalization

  • OLTP systems (many writes)
  • When data integrity is critical
  • Limited storage space
  • Frequent updates to the same data
  • When starting a new project (start normalized)

Use Denormalization

  • OLAP systems (many reads, few writes)
  • Data warehouses and reports
  • When read performance is critical
  • Caches and materialized views
  • Mobile apps with local databases

Practical Example

-- SCENARIO: E-commerce with products and categories

-- NORMALIZED (3NF) - Best for administration
CREATE TABLE categories (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2),
    category_id INT REFERENCES categories(id)
);

-- Query needs JOIN (slower but consistent)
SELECT p.name, c.name 
FROM products p 
JOIN categories c ON p.category_id = c.id
WHERE p.price > 100;

-- DENORMALIZED - Best for fast display
CREATE TABLE products_denormalized (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2),
    category_name VARCHAR(100)  -- redundant, but faster
);

-- Query is simpler (no JOIN)
SELECT name, category_name 
FROM products_denormalized 
WHERE price > 100;

-- HYBRID APPROACH (often the best)
-- Keep normalized for writes, add cache layer for reads

Best practice: Start normalized (3NF). Only denormalize when you have a proven performance problem.

Advantages of Normalization

  • Reduces data redundancy
  • Saves storage space
  • Avoids update/insert/delete anomalies
  • Ensures data consistency

Advantages of Denormalization

  • Faster query performance
  • Simpler queries (fewer JOINs)
  • Easier to understand for analysts
  • Better for reporting systems

Share this article