Database Indexing Done Right: What Most Developers Get Wrong

Why Most Developers Get Indexing Wrong

The typical developer relationship with indexes goes like this:

Query is slow
Add an index on the column in the WHERE clause
Query gets faster
Move on

This works — until it doesn't. At low data volumes, almost anything works. At 5 million rows, wrong indexing decisions compound into real production incidents.

The root cause is almost always the same: treating indexes as a magic speed-up rather than understanding what they actually do.

What an Index Actually Is

An index is a separate data structure maintained alongside your table. It stores a sorted copy of one or more column values, along with a pointer back to the corresponding row in the table.

PostgreSQL's default index type is a B-tree (Balanced Tree). Here's the mental model:

Imagine your table is a book. Without an index, finding every row where status = 'active' means reading every single page — a full table scan. With an index on status, the database goes to the index (the book's index in the back), finds the exact page numbers for status = 'active', and jumps directly there.

The B-tree keeps entries sorted. This makes it extremely efficient for:

Exact matches (WHERE user_id = 123)
Range queries (WHERE created_at > '2024-01-01')
Sorting (ORDER BY last_name)
Prefix matches (WHERE email LIKE 'john%')

What it's not efficient for:

LIKE '%something' (wildcard at the start — can't use sorting)
Functions on columns (WHERE LOWER(email) = 'john@example.com' — bypasses the index)

The Index Trade-Off Nobody Talks About

Every index you create is a write tax.

On every INSERT, UPDATE, or DELETE, the database must update every index on that table. A table with 10 indexes has 10 data structures to maintain on every write.

This is invisible at low volume. At high write throughput — event logs, audit trails, real-time data ingestion — over-indexing becomes a genuine performance problem.

The rule: index for your reads, but account for your writes.

Before adding any index, ask:

How often is this query executed?
How many rows does it scan without the index?
What's the write-to-read ratio on this table?

A table that's 95% reads can handle many indexes. A high-write table needs surgical precision.

Composite Indexes: The Order Matters More Than You Think

A composite index covers multiple columns. This is where most indexing decisions go wrong.

sql

-- This index: CREATE INDEX idx_orders_user_status ON orders (user_id, status);

-- Supports these queries:

SELECT * FROM orders WHERE user_id = 123;

SELECT * FROM orders WHERE user_id = 123 AND status = 'active';

-- But NOT this query efficiently:

SELECT * FROM orders WHERE status = 'active';

The rule is called the left-prefix rule: a composite index on (A, B, C) can be used for queries filtering on A, A + B, or A + B + C — but not on B alone or C alone.

How to order columns in a composite index:

Put the most selective column first (the one that eliminates the most rows)
Put columns used in equality conditions before range conditions
If a column appears in ORDER BY, consider putting it last

-- Query: find active orders for a user, sorted by date

SELECT * FROM orders

WHERE user_id = 123

AND status = 'active'

ORDER BY created_at DESC;

-- Optimal index: CREATE INDEX idx_orders_user_status_date

ON orders (user_id, status, created_at DESC);

This index handles the filter AND the sort in a single pass — no additional sort step needed.

Partial Indexes: The Underused Power Tool

A partial index only indexes rows that match a condition. Smaller index, faster lookups, less write overhead.

-- Instead of indexing all orders:

CREATE INDEX idx_orders_status

ON orders (status);

-- Index only the ones that matter for your queries:

CREATE INDEX idx_active_orders ON orders (user_id, created_at)

WHERE status = 'active';

If 90% of your queries filter for status = 'active', why maintain an index entry for every completed, cancelled, and refunded order? A partial index on active orders is smaller, faster, and cheaper to maintain.

Real-world use cases for partial indexes:

Unprocessed jobs in a queue table (WHERE processed = false)
Soft-deleted records (WHERE deleted_at IS NULL)
Premium users in a multi-tier SaaS (WHERE plan = 'enterprise')

Expression Indexes: When You Can't Change the Query

Sometimes you inherit queries you can't modify. The query uses LOWER(email) or DATE(created_at) — and because it wraps the column in a function, the standard index is bypassed.

The fix: create an index on the expression itself.

-- Query that ignores a regular email index:

SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- Index the expression instead:

CREATE INDEX idx_users_email_lower ON users (LOWER(email));

Now the query planner can use it. This is especially useful for case-insensitive searches without forcing application-level normalization.

How to Know If Your Index Is Being Used

Adding an index doesn't guarantee it will be used. The query planner makes its own decisions based on statistics and cost estimates.

Use EXPLAIN ANALYZE to verify:

EXPLAIN ANALYZE

SELECT * FROM orders

WHERE user_id = 123 AND status = 'active';

What to look for:

Index Scan or Index Only Scan → index is being used ✓
Seq Scan → full table scan, index is not being used
Bitmap Heap Scan → index used, then rows fetched from table (common for range queries)

Why the planner might ignore your index:

Table is too small — below ~1,000 rows, a seq scan is often faster. The planner knows this.
Low selectivity — if status = 'active' covers 80% of rows, scanning the index + fetching rows is slower than just reading the table.
Outdated statistics — run ANALYZE table_name to refresh the planner's row estimates.
Column wrapped in a function — WHERE UPPER(name) = 'JOHN' bypasses idx_name. Use expression indexes.

Finding Indexes That Aren't Pulling Their Weight

PostgreSQL tracks index usage. Query this to find indexes that are never used:

SELECT

schemaname,

tablename,

indexname,

idx_scan,

idx_tup_read,

idx_tup_fetch

FROM pg_stat_user_indexes

WHERE idx_scan = 0

ORDER BY pg_relation_size(indexrelid) DESC;

idx_scan = 0 means the index has never been used since the last stats reset. These are candidates for removal — they're costing you write overhead for zero read benefit.

Before dropping: confirm with your team that no seasonal queries or batch jobs rely on them. Reset stats and monitor for a full business cycle if unsure.

The Indexing Checklist

Before creating any index:

Is the query slow because of missing index, or because of bad query logic?
How many rows does this table have? (Under 10k, indexes rarely help)
What's the write-to-read ratio on this table?
Is this a one-column filter or a multi-column filter? (Composite candidate?)
Does the query filter a subset of rows consistently? (Partial index candidate?)
Is a function applied to the column in WHERE? (Expression index candidate?)
After creating the index, did EXPLAIN ANALYZE confirm it's being used?

Before keeping an existing index:

Check pg_stat_user_indexes — is idx_scan > 0?
Does this index overlap with a composite index that already covers it?
Is the table high-write? Is this index worth the write cost?

Key Takeaways

Indexes are a trade-off: faster reads, slower writes. Index deliberately.
B-trees work best for equality, range, and sort queries — not for leading wildcards or functions on columns.
Composite index column order is critical. Left-prefix rule. Equality before range.
Partial indexes are underused and highly effective for filtered datasets.
Expression indexes solve the function-on-column problem.
Always verify with EXPLAIN ANALYZE. Assumptions about index usage are often wrong.
Audit unused indexes with pg_stat_user_indexes. Dead indexes are a hidden write tax.

For teams scaling SaaS databases past their first performance wall, getting indexing right is usually the highest-ROI move before reaching for read replicas or caching layers.

Database Indexing Done Right: What Most Developers Get Wrong

Why Most Developers Get Indexing Wrong

What an Index Actually Is

The Index Trade-Off Nobody Talks About

Composite Indexes: The Order Matters More Than You Think

Partial Indexes: The Underused Power Tool

Expression Indexes: When You Can't Change the Query

How to Know If Your Index Is Being Used

Finding Indexes That Aren't Pulling Their Weight

The Indexing Checklist

Key Takeaways

Comments

More from this blog

Building AI-Powered Dashboards for Decision Making

Event-Driven Architecture Explained Simply

Microservices vs Monolith: Which Architecture Should You Choose?

From Developer to Systems Thinker: Lessons in Building for Scale

Command Palette

Why Most Developers Get Indexing Wrong

What an Index Actually Is

The Index Trade-Off Nobody Talks About

Composite Indexes: The Order Matters More Than You Think

Partial Indexes: The Underused Power Tool

Expression Indexes: When You Can't Change the Query

How to Know If Your Index Is Being Used

Finding Indexes That Aren't Pulling Their Weight

The Indexing Checklist

Key Takeaways

Comments

More from this blog