SQL Performance Optimization: Writing Faster Queries

Slow SQL queries are one of the most common causes of application performance problems. A single inefficient query can bring a system to its knees. This guide covers proven techniques to identify and fix SQL performance issues, from indexing strategies to query rewriting.

## Understanding Query Performance

Before optimizing, understand why queries are slow:

- Full table scans instead of index lookups
- Missing or unused indexes
- Inefficient JOIN operations
- Suboptimal WHERE clauses
- Locking and contention issues
- Memory pressure

## The EXPLAIN Command

Always start with EXPLAIN to understand query execution:

```sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
```

Key columns to check:

- **type**: ALL (bad) vs ref/range (good)
- **key**: Which index is used
- **rows**: Estimated rows scanned
- **Extra**: Using filesort or temporary (usually bad)

## Indexing Strategies

### Create Indexes on WHERE Columns

```sql
CREATE INDEX idx_customer_id ON orders(customer_id);
```

### Composite Indexes

Order matters in composite indexes. Put most selective columns first:

```sql
-- Good for queries filtering by status AND date
CREATE INDEX idx_status_date ON orders(status, order_date);
```

### Covering Indexes

Include all columns needed by the query:

```sql
CREATE INDEX idx_covering ON orders(customer_id, status, total);

-- Query can be satisfied entirely from index
SELECT status, total FROM orders WHERE customer_id = 123;
```

### Avoid Over-Indexing

Each index has costs:

- Slower INSERT, UPDATE, DELETE
- More disk space
- Query optimizer overhead

## Common Anti-Patterns

### SELECT *

```sql
-- Bad: Fetches unnecessary columns
SELECT * FROM users WHERE id = 1;

-- Good: Only fetch what you need
SELECT name, email FROM users WHERE id = 1;
```

### Functions on Indexed Columns

```sql
-- Bad: Index cannot be used
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- Good: Range query can use index
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
```

### Leading Wildcards

```sql
-- Bad: Cannot use index
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- Good: Can use index
SELECT * FROM users WHERE email LIKE 'john%';
```

### OR Conditions

```sql
-- Bad: May not use indexes effectively
SELECT * FROM orders WHERE customer_id = 1 OR product_id = 2;

-- Good: UNION can use separate indexes
SELECT * FROM orders WHERE customer_id = 1
UNION
SELECT * FROM orders WHERE product_id = 2;
```

## JOIN Optimization

### Use Appropriate JOIN Types

- INNER JOIN when you need matching rows only
- LEFT JOIN when you need all rows from left table
- Avoid CROSS JOIN unless intentional

### Join Order

In most databases, put smaller tables first:

```sql
SELECT * FROM small_table s
JOIN large_table l ON s.id = l.small_id;
```

### Index Foreign Keys

```sql
CREATE INDEX idx_order_customer ON orders(customer_id);
```

## Subquery Optimization

### Replace Subqueries with JOINs

```sql
-- Slower: Subquery
SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE country = 'US');

-- Faster: JOIN
SELECT DISTINCT o.* FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'US';
```

### EXISTS vs IN

For existence checks, EXISTS is often faster:

```sql
SELECT * FROM customers c
WHERE EXISTS (
 SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
```

## Pagination Optimization

### OFFSET Issues

Large offsets are slow:

```sql
-- Gets slower as offset increases
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 100000;
```

### Keyset Pagination

```sql
-- Much faster for large datasets
SELECT * FROM orders
WHERE id > 100000
ORDER BY id LIMIT 10;
```

## Aggregation Optimization

### Index for GROUP BY

```sql
CREATE INDEX idx_status ON orders(status);

SELECT status, COUNT(*) FROM orders GROUP BY status;
```

### Filter Early

```sql
-- Bad: Aggregates all rows first
SELECT status, COUNT(*) FROM orders
GROUP BY status
HAVING status = 'completed';

-- Good: Filters before aggregation
SELECT status, COUNT(*) FROM orders
WHERE status = 'completed'
GROUP BY status;
```

## Monitoring Performance

### Slow Query Log

Enable and check slow query logs:

```sql
-- MySQL
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
```

### Query Profiling

```sql
-- MySQL
SET profiling = 1;
SELECT * FROM orders WHERE customer_id = 123;
SHOW PROFILE;
```

## Best Practices Summary

1. **Use EXPLAIN** before optimizing
2. **Create appropriate indexes** on WHERE, JOIN, ORDER BY columns
3. **Avoid SELECT ***
4. **Use covering indexes** for frequently accessed columns
5. **Rewrite queries** to use indexes effectively
6. **Monitor slow queries** regularly
7. **Test with production-like data volumes**

## Conclusion

SQL optimization is both art and science. Start with understanding your data and query patterns. Use EXPLAIN to diagnose issues, create strategic indexes, and rewrite queries to leverage those indexes. Remember: premature optimization is wasteful, but ignoring performance is dangerous. Always measure before and after changes.

评论
暂无评论