Performance Guide

This guide covers performance tuning, optimization strategies, and best practices for getting the best performance from rapsqlite.

Connection Pooling

Understanding Connection Pools

rapsqlite uses connection pooling internally. The default pool size is 1, but you can configure it:

from rapsqlite import Connection

# Create connection with custom pool size
conn = Connection("example.db")
conn.pool_size = 5  # Allow up to 5 concurrent connections
conn.connection_timeout = 30  # 30 second timeout for acquiring connections

When to Increase Pool Size

  • Concurrent operations: If you have many concurrent database operations, increase pool size

  • Long-running queries: If queries take a long time, more connections allow better concurrency

  • Mixed read/write workloads: Separate connections for reads and writes

Pool Size Guidelines

  • Default (1): Good for single-threaded async applications or low concurrency

  • Small (2-5): Good for moderate concurrency, most web applications

  • Large (10+): Only for high-concurrency scenarios, be mindful of SQLite’s write serialization

Note: SQLite serializes writes, so increasing pool size mainly helps with concurrent reads.

Single connection vs larger pool

  • Single connection (pool_size=1, default): Use when you have one logical worker (e.g. one async task doing DB work at a time), or when concurrency is low. Session-connection reuse means one connection handles many sequential queries efficiently.

  • Larger pool (2–5 or more): Use when many concurrent coroutines perform database operations at the same time (e.g. many concurrent HTTP requests each hitting the DB). Increase pool_size before the first operation; it cannot be changed after the pool is created.

Prepared Statement Caching

rapsqlite automatically benefits from prepared statement caching provided by sqlx (the underlying database library). sqlx caches prepared statements per connection, meaning:

  • Automatic caching: Prepared statements are cached automatically - no configuration needed

  • Per-connection cache: Each connection in the pool maintains its own cache

  • Query normalization: rapsqlite normalizes queries (removes extra whitespace) to maximize cache hits

  • Performance benefit: Repeated queries with the same structure reuse prepared statements

To maximize cache hits:

# Good: Same query structure, different parameters
for user_id in user_ids:
    await conn.fetch_all("SELECT * FROM users WHERE id = ?", [user_id])
# sqlx caches the prepared statement after first execution

# Also good: Query normalization handles whitespace differences
await conn.execute("INSERT INTO test (value) VALUES (?)", ["a"])
await conn.execute("INSERT  INTO  test  (value)  VALUES  (?)", ["b"])
# Both queries are normalized and benefit from the same prepared statement

Best practices:

  • Use consistent query formatting (normalization happens automatically)

  • Reuse the same query strings with different parameters

  • Keep connections alive for repeated queries (connection pooling helps)

  • Use parameterized queries (required for prepared statements)

  • Avoid dynamic query building when possible (reduces cache hits)

Performance impact:

Prepared statement caching provides significant performance benefits for repeated queries:

  • First execution: Statement is prepared and cached

  • Subsequent executions: Reuses cached prepared statement (much faster)

  • Typical improvement: 2-5x faster for repeated queries vs. unique queries

PRAGMA Optimization

Configure SQLite for your workload:

# For read-heavy workloads
async with connect("example.db", pragmas={
    "journal_mode": "WAL",  # Write-Ahead Logging
    "synchronous": "NORMAL",  # Balance safety and performance
    "cache_size": "-64000",  # 64MB cache (negative = KB)
}) as conn:
    # Your operations
    pass

# For write-heavy workloads
async with connect("example.db", pragmas={
    "journal_mode": "WAL",
    "synchronous": "FULL",  # Maximum safety
    "wal_autocheckpoint": "1000",  # Checkpoint every 1000 pages
}) as conn:
    # Your operations
    pass

Batch Operations

Use execute_many() for batch inserts:

# Good: Single transaction, prepared statement reuse
params = [[f"user_{i}"] for i in range(1000)]
await conn.execute_many("INSERT INTO users (name) VALUES (?)", params)

# Bad: Many individual transactions
for i in range(1000):
    await conn.execute("INSERT INTO users (name) VALUES (?)", [f"user_{i}"])

Connection Reuse

Reuse connections when possible:

# Good: Reuse connection
async with connect("example.db") as conn:
    for i in range(100):
        await conn.execute("INSERT INTO test (value) VALUES (?)", [i])

# Bad: Create new connection for each operation
for i in range(100):
    async with connect("example.db") as conn:
        await conn.execute("INSERT INTO test (value) VALUES (?)", [i])

Use Appropriate Fetch Methods

  • fetch_all(): When you need all rows

  • fetch_one(): When you expect exactly one row

  • fetch_optional(): When you might have zero or one row

  • Cursor.fetchmany(): When processing large result sets in chunks

Performance Monitoring

Query Timing

import time

start = time.perf_counter()
rows = await conn.fetch_all("SELECT * FROM users")
elapsed = time.perf_counter() - start
print(f"Query took {elapsed * 1000:.2f}ms")

Connection Pool Metrics

Monitor connection pool usage by tracking:

  • Number of concurrent operations

  • Connection acquisition timeouts

  • Pool size vs. actual usage

Troubleshooting Performance Issues

Slow Queries

  1. Check if using parameterized queries: String formatting prevents prepared statement caching

  2. Verify indexes: Use get_indexes() to check table indexes

  3. Analyze query plans: Use EXPLAIN QUERY PLAN to understand query execution

  4. Check PRAGMA settings: Ensure appropriate settings for your workload

High Memory Usage

  1. Reduce pool size: Smaller pool uses less memory

  2. Use fetchmany(): Process large result sets in chunks

  3. Clear query cache: Connection close clears prepared statement cache

Database Locked Errors

  1. Increase timeout: Set connection_timeout higher

  2. Use WAL mode: journal_mode = "WAL" allows concurrent reads

  3. Reduce transaction duration: Keep transactions short

  4. Implement retry logic: See error handling section in Advanced Usage Guide

Measuring performance and regression testing

  • Query timing: Use timed_fetch_all(conn, sql, parameters, on_timing=...) from rapsqlite to record query duration, or wrap calls with time.perf_counter(). See Advanced Usage Guide for monitoring and trace callbacks.

  • Regression tests: The test suite includes performance regression tests in tests/test_performance.py. Run with pytest tests/ -m performance to validate performance characteristics after changes.

Further Reading