Database Indexing Done Right: What Most Developers Get Wrong
Indexes are the single highest-leverage performance tool available to a backend engineer. They're also one of the most misunderstood. Here's what's actually happening under the hood — and how to use that knowledge to write faster queries.

We’re a digital engineering team focused on building secure, AI-driven, and scalable systems. From intelligent automation to cloud-native development, we turn complex challenges into powerful, future-ready solutions — one line of code at a time.
Why Most Developers Get Indexing Wrong
The typical developer relationship with indexes goes like this:
Query is slow
Add an index on the column in the WHERE clause
Query gets faster
Move on
This works — until it doesn't. At low data volumes, almost anything works. At 5 million rows, wrong indexing decisions compound into real production incidents.
The root cause is almost always the same: treating indexes as a magic speed-up rather than understanding what they actually do.
What an Index Actually Is
An index is a separate data structure maintained alongside your table. It stores a sorted copy of one or more column values, along with a pointer back to the corresponding row in the table.
PostgreSQL's default index type is a B-tree (Balanced Tree). Here's the mental model:
Imagine your table is a book. Without an index, finding every row where status = 'active' means reading every single page — a full table scan. With an index on status, the database goes to the index (the book's index in the back), finds the exact page numbers for status = 'active', and jumps directly there.
The B-tree keeps entries sorted. This makes it extremely efficient for:
Exact matches (
WHERE user_id = 123)Range queries (
WHERE created_at > '2024-01-01')Sorting (
ORDER BY last_name)Prefix matches (
WHERE email LIKE 'john%')
What it's not efficient for:
LIKE '%something'(wildcard at the start — can't use sorting)Functions on columns (
WHERE LOWER(email) = 'john@example.com'— bypasses the index)
The Index Trade-Off Nobody Talks About
Every index you create is a write tax.
On every INSERT, UPDATE, or DELETE, the database must update every index on that table. A table with 10 indexes has 10 data structures to maintain on every write.
This is invisible at low volume. At high write throughput — event logs, audit trails, real-time data ingestion — over-indexing becomes a genuine performance problem.
The rule: index for your reads, but account for your writes.
Before adding any index, ask:
How often is this query executed?
How many rows does it scan without the index?
What's the write-to-read ratio on this table?
A table that's 95% reads can handle many indexes. A high-write table needs surgical precision.
Composite Indexes: The Order Matters More Than You Think
A composite index covers multiple columns. This is where most indexing decisions go wrong.
sql
-- This index: CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- Supports these queries:
SELECT * FROM orders WHERE user_id = 123;
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';
-- But NOT this query efficiently:
SELECT * FROM orders WHERE status = 'active';
The rule is called the left-prefix rule: a composite index on (A, B, C) can be used for queries filtering on A, A + B, or A + B + C — but not on B alone or C alone.
How to order columns in a composite index:
Put the most selective column first (the one that eliminates the most rows)
Put columns used in equality conditions before range conditions
If a column appears in
ORDER BY, consider putting it last
-- Query: find active orders for a user, sorted by date
SELECT * FROM orders
WHERE user_id = 123
AND status = 'active'
ORDER BY created_at DESC;
-- Optimal index: CREATE INDEX idx_orders_user_status_date
ON orders (user_id, status, created_at DESC);
This index handles the filter AND the sort in a single pass — no additional sort step needed.
Partial Indexes: The Underused Power Tool
A partial index only indexes rows that match a condition. Smaller index, faster lookups, less write overhead.
-- Instead of indexing all orders:
CREATE INDEX idx_orders_status
ON orders (status);
-- Index only the ones that matter for your queries:
CREATE INDEX idx_active_orders ON orders (user_id, created_at)
WHERE status = 'active';
If 90% of your queries filter for status = 'active', why maintain an index entry for every completed, cancelled, and refunded order? A partial index on active orders is smaller, faster, and cheaper to maintain.
Real-world use cases for partial indexes:
Unprocessed jobs in a queue table (
WHERE processed = false)Soft-deleted records (
WHERE deleted_at IS NULL)Premium users in a multi-tier SaaS (
WHERE plan = 'enterprise')
Expression Indexes: When You Can't Change the Query
Sometimes you inherit queries you can't modify. The query uses LOWER(email) or DATE(created_at) — and because it wraps the column in a function, the standard index is bypassed.
The fix: create an index on the expression itself.
-- Query that ignores a regular email index:
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Index the expression instead:
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
Now the query planner can use it. This is especially useful for case-insensitive searches without forcing application-level normalization.
How to Know If Your Index Is Being Used
Adding an index doesn't guarantee it will be used. The query planner makes its own decisions based on statistics and cost estimates.
Use EXPLAIN ANALYZE to verify:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123 AND status = 'active';
What to look for:
Index ScanorIndex Only Scan→ index is being used ✓Seq Scan→ full table scan, index is not being usedBitmap Heap Scan→ index used, then rows fetched from table (common for range queries)
Why the planner might ignore your index:
Table is too small — below ~1,000 rows, a seq scan is often faster. The planner knows this.
Low selectivity — if
status = 'active'covers 80% of rows, scanning the index + fetching rows is slower than just reading the table.Outdated statistics — run
ANALYZE table_nameto refresh the planner's row estimates.Column wrapped in a function —
WHERE UPPER(name) = 'JOHN'bypassesidx_name. Use expression indexes.
Finding Indexes That Aren't Pulling Their Weight
PostgreSQL tracks index usage. Query this to find indexes that are never used:
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
idx_scan = 0 means the index has never been used since the last stats reset. These are candidates for removal — they're costing you write overhead for zero read benefit.
Before dropping: confirm with your team that no seasonal queries or batch jobs rely on them. Reset stats and monitor for a full business cycle if unsure.
The Indexing Checklist
Before creating any index:
Is the query slow because of missing index, or because of bad query logic?
How many rows does this table have? (Under 10k, indexes rarely help)
What's the write-to-read ratio on this table?
Is this a one-column filter or a multi-column filter? (Composite candidate?)
Does the query filter a subset of rows consistently? (Partial index candidate?)
Is a function applied to the column in WHERE? (Expression index candidate?)
After creating the index, did
EXPLAIN ANALYZEconfirm it's being used?
Before keeping an existing index:
Check
pg_stat_user_indexes— isidx_scan> 0?Does this index overlap with a composite index that already covers it?
Is the table high-write? Is this index worth the write cost?
Key Takeaways
Indexes are a trade-off: faster reads, slower writes. Index deliberately.
B-trees work best for equality, range, and sort queries — not for leading wildcards or functions on columns.
Composite index column order is critical. Left-prefix rule. Equality before range.
Partial indexes are underused and highly effective for filtered datasets.
Expression indexes solve the function-on-column problem.
Always verify with
EXPLAIN ANALYZE. Assumptions about index usage are often wrong.Audit unused indexes with
pg_stat_user_indexes. Dead indexes are a hidden write tax.
For teams scaling SaaS databases past their first performance wall, getting indexing right is usually the highest-ROI move before reaching for read replicas or caching layers.



