애플리케이션 성능을 10배 향상시키는 8가지 Python 데이터베이스 최적화 기법

발행: 6일 전 (2025년 12월 19일 오후 09:55 GMT+9)

13 min read

Source: Dev.to

📚 About the Author

베스트셀러 작가로서, Amazon에서 제 책들을 살펴보시길 초대합니다.
Medium을(를) 팔로우하고 응원해 주세요. 감사합니다! 여러분의 응원은 큰 힘이 됩니다!

🚀 파이썬 데이터베이스 접근 속도 높이기

파이썬으로 데이터베이스를 다룰 때, 속도는 단순히 있으면 좋은 것이 아니라 필수입니다. 애플리케이션이 느려지면, 대부분 데이터베이스 호출이 원인입니다. 시간이 지나면서 실제 차이를 만드는 실용적인 방법들을 모아두었습니다. 이는 이론이 아니라, 애플리케이션이 성장해도 반응성을 유지하도록 제가 정기적으로 사용하는 기술들입니다. 아래는 여덟 가지 가장 효과적인 방법입니다.

1️⃣ 쿼리 플랜 검사하기

먼저, 데이터베이스가 실제로 무엇을 하고 있는지 확인하세요. PostgreSQL에서는 쿼리 앞에 EXPLAIN ANALYZE를 붙이면 됩니다. 이는 실제로 쿼리를 실행하지 않고 실행 계획과 비용 추정치를 보여줍니다.

import psycopg2

# Connect to your database
conn = psycopg2.connect(database="myapp", user="app_user", password="secret")
cur = conn.cursor()

# Ask the database to explain its plan for a query
query = "SELECT * FROM user_orders WHERE user_id = 456;"
cur.execute(f"EXPLAIN ANALYZE {query}")
execution_plan = cur.fetchall()

for line in execution_plan:
    print(line[0])

# Look for lines about "Seq Scan" (slow) vs "Index Scan" (fast)
# Also check the estimated cost; a lower number is better.

“Seq Scan on user_orders” 가 보이면, 데이터베이스가 모든 행을 읽고 있다는 뜻이며—큰 테이블에서는 매우 느립니다. 대신 “Index Scan” 이 보이길 원합니다. 이 간단한 검사가 모든 성능 문제의 출발점이 됩니다.

2️⃣ 올바른 인덱스 추가하기

느린 쿼리를 해결하는 가장 흔한 방법은 인덱스를 추가하는 것입니다. 인덱스는 책의 목차와 같아서, 모든 페이지를 뒤지는 대신 필요한 페이지로 바로 이동하게 해줍니다.

from sqlalchemy import create_engine, text

engine = create_engine('postgresql://user:pass@localhost/myapp')
with engine.connect() as conn:
    # Single‑column index
    conn.execute(text("CREATE INDEX idx_user_email ON users(email);"))

    # Composite index for queries that filter by city and status
    conn.execute(
        text(
            "CREATE INDEX idx_city_active "
            "ON customers(city, account_status) "
            "WHERE account_status = 'active';"
        )
    )

    print("Indexes created.")

Note: 인덱스는 읽기 속도를 높이지만, INSERT/UPDATE 시마다 인덱스를 업데이트해야 하므로 쓰기 속도를 늦춥니다. WHERE, ORDER BY, JOIN 절에 자주 사용되는 컬럼에만 인덱스를 추가하세요.

3️⃣ 커넥션 풀 사용하기

많은 사용자가 연결을 열고 닫을 때, 연결이 부족하거나 오버헤드가 발생할 수 있습니다. 커넥션 풀은 재사용 가능한 열린 연결 집합을 유지합니다.

from sqlalchemy import create_engine, text
from sqlalchemy.pool import QueuePool

engine = create_engine(
    'postgresql://user:pass@localhost/myapp',
    poolclass=QueuePool,
    pool_size=10,        # 10 connections always ready
    max_overflow=20,     # Allow up to 20 extra if needed
    pool_timeout=30,     # Wait 30 seconds for a free connection
    pool_recycle=1800    # Recycle connections after 30 minutes
)

# Using the pool is the same as usual
with engine.connect() as conn:
    result = conn.execute(text("SELECT name FROM products"))
    for row in result:
        print(row[0])

애플리케이션 시작 시 한 번 설정하면 됩니다. 풀은 트래픽이 많은 웹 앱에서 “too many connections” 오류를 방지합니다.

4️⃣ 배치 삽입 / 업데이트

한 번에 한 행씩 삽입하는 것은 성능에 큰 타격을 줍니다. 각 삽입마다 DB로 라운드‑트립이 발생하기 때문이죠. 배치 작업을 사용하세요.

import psycopg2

conn = psycopg2.connect(database="myapp", user="app_user", password="secret")
cur = conn.cursor()

# Data to insert
new_logs = [
    ('error',   '2023-10-26 10:00:00', 'Payment failed'),
    ('info',    '2023-10-26 10:00:01', 'User logged in'),
    ('warning', '2023-10-26 10:00:02', 'Cache nearly full'),
]

# Insert all rows in one round‑trip
cur.executemany(
    "INSERT INTO app_logs (level, timestamp, message) VALUES (%s, %s, %s)",
    new_logs
)
conn.commit()
print(f"Inserted {cur.rowcount} log entries efficiently.")

) Batching can turn a minutes‑long operation into a few seconds. The same idea works for bulk updates (e.g., using a CASE statement).

5️⃣ 무거운 쿼리를 위한 물리화된 뷰

If a complex query joins many tables and performs heavy calculations, but the underlying data doesn’t change every second, a materialized view is perfect. It stores the query result as a real table that can be refreshed periodically.

from sqlalchemy import create_engine, text
from datetime import date

engine = create_engine('postgresql://user:pass@localhost/myapp')

with engine.connect() as conn:
    # Create a materialized view for a weekly sales report
    conn.execute(text("""
        CREATE MATERIALIZED VIEW weekly_sales_report AS
        SELECT
            o.order_id,
            o.order_date,
            c.customer_name,
            SUM(oi.quantity * oi.unit_price) AS total_amount
        FROM orders o
        JOIN order_items oi ON o.order_id = oi.order_id
        JOIN customers c ON o.customer_id = c.customer_id
        WHERE o.order_date >= (CURRENT_DATE - INTERVAL '7 days')
        GROUP BY o.order_id, o.order_date, c.customer_name
    """))
    print("Materialized view created.")

Refresh it when the source data changes:

REFRESH MATERIALIZED VIEW weekly_sales_report;

6️⃣ 필요한 컬럼만 `SELECT` 하기

Fetching unnecessary columns or rows wastes bandwidth and memory. Always limit the result set to what the application actually uses.

# Bad: selects all columns
cur.execute("SELECT * FROM users WHERE id = %s", (user_id,))

# Good: select only needed columns
cur.execute(
    "SELECT username, email, created_at FROM users WHERE id = %s",
    (user_id,)
)

7️⃣ 대용량 결과 집합을 위한 서버‑사이드 커서 활용

When you need to process millions of rows, pulling them all into Python at once can exhaust memory. Server‑side (named) cursors stream rows incrementally.

import psycopg2

conn = psycopg2.connect(database="myapp", user="app_user", password="secret")
cur = conn.cursor(name="large_fetch")   # Named cursor → server‑side

cur.execute("SELECT id, data FROM big_table")
for row in cur:
    process(row)   # Handle one row at a time

8️⃣ 자주 사용하는 데이터 캐시하기

For data that rarely changes (e.g., lookup tables, configuration), cache it in memory or an external cache (Redis, Memcached). This eliminates repeated DB hits.

import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_country_name(country_code):
    # Try cache first
    cached = r.get(f"country:{country_code}")
    if cached:
        return cached.decode('utf-8')

    # Fallback to DB
    cur.execute(
        "SELECT name FROM countries WHERE code = %s",
        (country_code,)
    )
    name = cur.fetchone()[0]

    # Store in cache for next time (TTL = 1 hour)
    r.setex(f"country:{country_code}", 3600, name)
    return name

🎯 핵심 요약

Performance tuning is an iterative process: measure → identify → fix → re‑measure. By regularly inspecting query plans, adding appropriate indexes, pooling connections, batching operations, using materialized views, limiting SELECTs, streaming large results, and caching static data, you’ll keep your Python applications fast and scalable.

Happy coding! 🚀

Optimizing Database Queries & Caching

Materialized View Example

SELECT
    product_id,
    SUM(quantity) AS total_units,
    SUM(quantity * unit_price) AS total_revenue
FROM order_details
WHERE order_date > CURRENT_DATE - 7
GROUP BY product_id
ORDER BY total_revenue DESC;

# Refresh the view (e.g., hourly via a scheduler)
conn.execute(text("REFRESH MATERIALIZED VIEW weekly_sales_report;"))

# Query the view – instant results
result = conn.execute(text("SELECT * FROM weekly_sales_report LIMIT 5;"))
for row in result:
    print(f"Product {row[0]}: ${row[2]:.2f} revenue")

첫 번째 생성 및 각 새로 고침은 느린 쿼리를 실행하지만, 물리화 뷰에서 SELECT를 수행하는 모든 경우는 일반 테이블에서 읽는 것만큼 빠릅니다. 저는 이를 대시보드와 보고서에 사용합니다.

Simple Redis Cache for Frequently‑Read Data

import redis
import json
import hashlib
import psycopg2

# Connect to Redis
cache = redis.Redis(host='localhost', port=6379, db=0)

# Connect to PostgreSQL
db_conn = psycopg2.connect(database="myapp", user="user", password="pass")
db_cur = db_conn.cursor()

def get_top_products(limit=10, cache_seconds=300):
    """Return top‑selling products, cached for `cache_seconds`."""
    # 1️⃣ Build a unique cache key
    query_signature = f"top_products_{limit}"
    cache_key = hashlib.md5(query_signature.encode()).hexdigest()

    # 2️⃣ Try the cache first
    cached_result = cache.get(cache_key)
    if cached_result is not None:
        print("Result loaded from cache.")
        return json.loads(cached_result)

    # 3️⃣ Cache miss → query the DB
    db_cur.execute("""
        SELECT product_id, product_name, COUNT(*) AS order_count
        FROM order_items
        GROUP BY product_id, product_name
        ORDER BY order_count DESC
        LIMIT %s
    """, (limit,))
    result = db_cur.fetchall()

    # 4️⃣ Store the fresh result in Redis
    cache.setex(cache_key, cache_seconds, json.dumps(result))
    print("Result queried from database and cached.")
    return result

# Usage
products = get_top_products(limit=5)
for prod_id, name, count in products:
    print(f"{name}: ordered {count} times")

TTL(시간‑to‑live)을 설정하면 오래된 데이터가 영원히 남는 것을 방지할 수 있습니다. 이 패턴은 홈페이지 목록, 리더보드, 혹은 즉시 변하지 않는 공개 데이터를 캐시하는 데 적합합니다.

Query‑Rewrite Tips

불필요한 작업을 줄이기 위해 더 명확한 SQL을 작성하세요.

-- Slow version: IN subquery
SELECT *
FROM employees
WHERE department_id IN (
    SELECT id FROM departments WHERE location = 'NYC'
);

-- Faster version: JOIN (often better optimized)
SELECT e.*
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.location = 'NYC';

-- Be specific in SELECT
SELECT id, first_name, email
FROM users;

Monitoring Query Performance

import time
import logging
from contextlib import contextmanager

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@contextmanager
def monitor_query(query_tag):
    """Time a database operation and log the duration."""
    start = time.perf_counter()
    try:
        yield
    finally:
        elapsed = time.perf_counter() - start
        logger.info(f"Query '{query_tag}' took {elapsed:.4f} seconds")
        if elapsed > 0.5:  # Warn on slow queries
            logger.warning(f"Slow query alert: '{query_tag}'")

# Example usage
with monitor_query("fetch_recent_orders"):
    cur.execute(
        "SELECT * FROM orders WHERE order_date > NOW() - INTERVAL '1 day'"
    )
    orders = cur.fetchall()

print(f"Fetched {len(orders)} orders.")

이러한 타이밍을 파일이나 모니터링 시스템에 기록하세요. 시간이 지나면 추세를 파악하고, 회귀를 조기에 발견하며, 성능을 미스터리에서 관리 가능한 규율로 전환할 수 있습니다.

Putting It All Together

데이터베이스 성능은 의도적인 접근에 달려 있습니다:

Measure – 병목 현상을 찾습니다.
Target – 인덱스, 물리화 뷰 등으로 해결책을 적용합니다.

views, caching).
3. Scale – 필요에 따라 풀링, 샤딩 또는 기타 패턴을 사용하세요.
4. Watch – 지속적으로 모니터링하여 성능이 유지되도록 합니다.

모든 프로젝트에 모든 기술을 적용할 필요는 없지만, 도구 상자에 이들을 갖추면 거의 모든 성능 저하 상황을 처리할 수 있습니다. 작게 시작하세요: 오늘 느린 쿼리 하나를 선택하고, EXPLAIN을 실행한 뒤 인덱스를 테스트하거나 쿼리를 재작성해 보세요. 첫 번째 성공은 이러한 방법이 얼마나 강력한지 보여줍니다.

📘 제 채널에서 최신 전자책을 무료로 확인해 보세요!
👍 좋아요, 공유, 댓글, 그리고 구독으로 최신 소식을 받아보세요.

101 Books

101 Books는 저자 Aarav Joshi가 공동 설립한 AI 기반 출판사입니다. 첨단 AI 기술을 활용해 출판 비용을 매우 낮게 유지하고 있어—일부 책은 $4에 판매됩니다—품질 높은 지식을 모두에게 제공할 수 있습니다.

Explore our catalog – Golang Clean Code (link placeholder)

[w.amazon.com/dp/B0DQQF9K3Z](https://w.amazon.com/dp/B0DQQF9K3Z) **available on Amazon.**

Stay tuned for updates and exciting news. When shopping for books, search for **Aarav Joshi** to find more of our titles. Use the provided link to enjoy **special discounts**!

우리의 창작물

꼭 확인해 보세요:

우리는 Medium에 있습니다

애플리케이션 성능을 10배 향상시키는 8가지 Python 데이터베이스 최적화 기법

📚 About the Author

🚀 파이썬 데이터베이스 접근 속도 높이기

1️⃣ 쿼리 플랜 검사하기

2️⃣ 올바른 인덱스 추가하기

3️⃣ 커넥션 풀 사용하기

4️⃣ 배치 삽입 / 업데이트

5️⃣ 무거운 쿼리를 위한 물리화된 뷰

6️⃣ 필요한 컬럼만 `SELECT` 하기

7️⃣ 대용량 결과 집합을 위한 서버‑사이드 커서 활용

8️⃣ 자주 사용하는 데이터 캐시하기

🎯 핵심 요약

Optimizing Database Queries & Caching

Materialized View Example

Simple Redis Cache for Frequently‑Read Data

Query‑Rewrite Tips

Monitoring Query Performance

Putting It All Together

101 Books

우리의 창작물

우리는 Medium에 있습니다

관련 글

프리즈마 ORM + PostgreSQL 궁극 가이드 (2025년판)

11주차: Prisma!

VACUUM은 거짓이다: 인덱스에 관하여

데이터베이스 인덱스 이해하기: 작동 원리와 성능 저하 시점

📚 About the Author

🚀 파이썬 데이터베이스 접근 속도 높이기

1️⃣ 쿼리 플랜 검사하기

2️⃣ 올바른 인덱스 추가하기

3️⃣ 커넥션 풀 사용하기

4️⃣ 배치 삽입 / 업데이트

5️⃣ 무거운 쿼리를 위한 물리화된 뷰

6️⃣ 필요한 컬럼만 SELECT 하기

7️⃣ 대용량 결과 집합을 위한 서버‑사이드 커서 활용

8️⃣ 자주 사용하는 데이터 캐시하기

🎯 핵심 요약

Optimizing Database Queries & Caching

Materialized View Example

Simple Redis Cache for Frequently‑Read Data

Query‑Rewrite Tips

Monitoring Query Performance

Putting It All Together

101 Books

우리의 창작물

우리는 Medium에 있습니다

관련 글

프리즈마 ORM + PostgreSQL 궁극 가이드 (2025년판)

11주차: Prisma!

VACUUM은 거짓이다: 인덱스에 관하여

데이터베이스 인덱스 이해하기: 작동 원리와 성능 저하 시점

6️⃣ 필요한 컬럼만 `SELECT` 하기