The EXPLAIN
command is a powerful tool that helps you understand how your SQL queries are executed by the database engine. By using EXPLAIN
, you can gain insights into the execution plan of your queries, which can help you identify potential performance bottlenecks and optimize your SQL code. Common problems like not using proper indexes, performing full table scans or file sorts, or inefficient joins can usually be identified with EXPLAIN
command.
I have always used the EXPLAIN
command as the starting point for any SQL query optimization.
However, recently, I found a particularly challenging query, where the EXPLAIN
output was not giving enough information.
This is when I discovered the EXPLAIN ANALYZE
command, which provides even more detailed information about query execution.
Since then, EXPLAIN ANALYZE
has been a game changer on how I approach SQL query optimization, and I want to share its power with you.
What is EXPLAIN ANALYZE?
EXPLAIN ANALYZE
is an extension of the EXPLAIN command that not only shows the execution plan of a query but also executes the query and provides actual runtime statistics. This includes information about the time taken for each step of the execution plan, the number of rows processed, and other performance metrics.
This additional information can be invaluable for identifying performance issues that may not be apparent from the execution plan alone.
How to use EXPLAIN ANALYZE
Using EXPLAIN ANALYZE
is straightforward. You simply prepend the command to your SQL query.
For example, consider the following SQL query:
EXPLAIN ANALYZE
SELECT u.id, u.name, o.id AS order_id, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.age > 30 AND o.status = 'completed';
This query joins the users
and orders
tables, filtering users by age and orders by status. When you run this command, the database will execute the query and return the full execution plan along with runtime statistics.
Interpreting the Output
Let’s walk through a sample output from EXPLAIN ANALYZE
for the query above:
Nested Loop (cost=0.43..120.85 rows=100 width=48) (actual time=0.025..1.234 rows=50 loops=1)
-> Index Scan using idx_users_age on users u (cost=0.29..8.50 rows=200 width=24) (actual time=0.015..0.200 rows=100 loops=1)
Index Cond: (age > 30)
-> Index Scan using idx_orders_user_id_status on orders o (cost=0.14..0.56 rows=1 width=24) (actual time=0.008..0.010 rows=1 loops=100)
Index Cond: ((user_id = u.id) AND (status = 'completed'::text))
Planning Time: 0.300 ms
Execution Time: 1.400 ms
These outputs can be quite detailed, especially for complex queries. Here’s how to break down the most important parts:
Execution Plan
The execution plan is like a roadmap for your query. It’s usually shown as a tree: each node is an operation (scan, join, sort, etc.), and the edges show how data flows between them.
In this example, the root node is a Nested Loop
, which means the database is joining the users
and orders
tables using a nested loop join. Each child node is an Index Scan
:
- The first scans the
users
table using an index onage
, efficiently filtering users. - The second scans the
orders
table using an index onuser_id
andstatus
, efficiently finding matching orders.
Seeing index scans is a good sign: it means the database is using indexes to avoid slow full table scans.
The loops
value shows how many times each operation ran. For example, loops=1
for the outer scan means it ran once, while loops=100
for the inner scan means it ran 100 times—once for each row from the outer scan. High loop counts, especially in nested loops, can signal inefficiency if the inner operation is costly or returns many rows.
Common Operation Types in Execution Plans
Here are some common operations you’ll see in execution plans:
- Seq Scan (Sequential Scan): Reads every row in a table. Used when no suitable index exists or when the planner estimates that scanning the whole table is faster.
- Index Scan: Uses an index to quickly find matching rows. Much faster than a sequential scan for selective queries.
- Bitmap Index Scan / Bitmap Heap Scan: Efficient for queries that match many rows; combines multiple indexes or conditions before fetching table data.
- Nested Loop: Joins two sets of rows by iterating over one and searching for matches in the other. Efficient for small datasets or when indexes are available.
- Hash Join: Builds a hash table in memory for one input and probes it with the other. Good for equality joins on large, unsorted datasets.
- Merge Join: Joins two sorted inputs by scanning them in order. Efficient when both inputs are already sorted.
- Sort: Orders rows based on specified columns. Can be expensive for large datasets.
- Aggregate: Performs calculations like COUNT, SUM, AVG, etc., over groups of rows.
Understanding these operations—and how they’re combined—helps you spot bottlenecks and find optimization opportunities.
Cost Estimates
The cost
values are estimates of resource usage (CPU, I/O, etc.) for each operation. The first number is the startup cost (to get the first row), and the second is the total cost (to get all rows). These numbers are unitless and only meaningful relative to each other—they help the planner choose the most efficient plan.
A very large number on the right side of the cost range suggests that the operation is expected to be resource-intensive and could be a performance bottleneck. What counts as “large” depends on your database and query, but if a cost is much higher than others in the plan, it’s worth investigating.
Costs accumulate up the tree, so the root node’s cost is the total estimated cost for the whole query.
Actual Time
The actual time
values show how long each operation really took: the first number is the time to get the first row, the second is the total time for all rows. Times are usually in milliseconds (ms). Look for operations with high actual times—they’re often the best targets for optimization.
Planning Time and Execution Time
Planning Time
is how long the database spent figuring out the execution plan. Execution Time
is the total time to actually run the query. Both are useful for understanding overall performance, especially for complex queries where planning can take a while.
Rows
The rows
value is the planner’s estimate of how many rows each operation will process. This helps you understand the scale of data at each step.
Conclusion
EXPLAIN ANALYZE
is a very useful tool for SQL query optimization, offering deep insights into how your queries are executed and where time and resources are spent. By learning to interpret its output, you can pinpoint performance bottlenecks and make informed decisions to improve your queries.
The output can be a bit overwhelming at first. Focus on steps with high actual time and cost, as these are usually the best opportunities for optimization.