SQLite 中缓存的效率

发布: 2周前 (2026年1月19日 GMT+8 07:23)

10 分钟阅读

原文: Dev.to

Source: Dev.to

A Eficiência do Cache no SQLite 的封面图片

概览

目标： 比较 NoSQL 缓存（Valkey）与 SQLite 内部缓存的读取速度。
应用： 使用 Python + Flask 开发的 REST JSON Web 服务。
端点：
1. 缩短 URL – 接收一个长 URL 并返回短版本。
2. 解析短 URL – 根据短版本返回原始 URL。

为什么缓存很重要？

创建的缓存用于读取先前缩短的 URL。当以结构化方式存储 URL 的数据库不需要在每次请求时去文件中查找时，Web 服务器可以更高效地响应。

使用的技术

层	技术
Web 框架	Python + Flask
NoSQL 缓存	Valkey（Redis 的开源版本）
数据库	SQLite3（带内部缓存功能）

服务器路由（端点）

from flask import Flask, request, jsonify
from business.link_shortener import generate_new_link, get_link_by_hash_string

app = Flask(__name__)

# ----------------------------------------------------------------------
#  Health‑check
# ----------------------------------------------------------------------
@app.route("/heart-beat", methods=["GET"])
def heart_beat() -> dict:
    """Endpoint simples para verificar se a aplicação está rodando."""
    return {"OK": True}


# ----------------------------------------------------------------------
#  Criação de link encurtado
# ----------------------------------------------------------------------
@app.route("/link", methods=["POST"])
def create_link() -> dict:
    """
    Recebe um JSON no formato:
        { "url": "https://exemplo.com/alguma/pagina" }

    Retorna:
        { "short_link": "<hash>" }
    """
    data = request.get_json(silent=True) or {}
    url = data.get("url")

    if not url:
        return jsonify({"error": "Campo 'url' é obrigatório."}), 400

    short_link = generate_new_link(url)
    return {"short_link": short_link}, 201


# ----------------------------------------------------------------------
#  Resgate de link original a partir do hash
# ----------------------------------------------------------------------
@app.route("/link/<string:short_link>", methods=["GET"])
def get_original_link(short_link: str) -> dict:
    """
    Busca o URL original associado ao hash informado.

    Exemplo de chamada:
        GET /link/abc123

    Retorna:
        { "link": "https://exemplo.com/alguma/pagina" }
    """
    original_url = get_link_by_hash_string(short_link)

    if not original_url:
        return jsonify({"error": "Link não encontrado."}), 404

    return {"link": original_url}, 200

注意事项

项目	备注
Importação	`jsonify` 已添加，以便更轻松地创建带有状态码的 JSON 响应。
Rotas	每个路由都有一个文档块（docstring），描述其目的、预期参数和响应格式。
Validação	在创建链接（`POST /link`）时会检查是否提供了 `url` 字段；否则返回 400。
Parâmetro de caminho	读取端点（`GET /link/<short_link>`）将哈希作为 URL 的一部分接收，避免了查询字符串或请求体的需求。
Códigos de status	成功创建返回 201，错误请求返回 400，哈希不存在返回 404。
Nomes de funções	函数已重命名（`heart_beat`、`create_link`、`get_original_link`），以避免冲突并提升可读性。
Separadores	注释行（`# ----------------------------------------------------------------------`）有助于在视觉上组织代码。
Tipagem	已加入类型注解（`-> dict`、`short_link: str`），以提升自动补全和文档质量。
Segurança	`request.get_json(silent=True)` 可防止在请求体不是有效 JSON 时抛出异常。
Consistência	所有响应均以字典形式返回（或通过 `jsonify`），让 Flask 自动转换为 JSON。
Teste rápido	`bash\ncurl -X POST -H "Content-Type: application/json" -d '{"url":"https://exemplo.com"}' http://localhost:5000/link\n`

| | **执行** | 保存文件为app.py并运行flask run（或在存在 if name == “main”: app.run()时运行python app.py`）。 |

应用核心（主文件）

import sqlite3
import random
import string
import threading
import valkey
import json

# Thread‑local storage for database connections
_thread_local = threading.local()

# Flag that activates the key‑value database cache
kv_cache = True

# Global connection pool for Valkey
kv_pool = valkey.ConnectionPool(
    host='localhost',
    port=6379,
    db=0,
    max_connections=50,          # Match the number of concurrent threads
    socket_keepalive=True,
    socket_connect_timeout=5,
    retry_on_timeout=True,
    decode_responses=True       # Automatically decode responses to strings
)

# ----------------------------------------------------------------------
# Initialize database schema
# ----------------------------------------------------------------------
conn = sqlite3.connect('database.db')
cur = conn.cursor()
cur.execute('''
    CREATE TABLE IF NOT EXISTS links (
        id INTEGER PRIMARY KEY,
        link_hash TEXT,
        link_target TEXT
    )
''')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_id ON links (id)')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_link_hash ON links (link_hash)')
conn.commit()
conn.close()

def get_db_connection():
    """Return a thread‑local SQLite connection."""
    if not hasattr(_thread_local, 'conn'):
        _thread_local.conn = sqlite3.connect(
            'database.db',
            check_same_thread=False
        )
        # Disable SQLite's internal page cache and force temp files on disk
        _thread_local.conn.execute('PRAGMA cache_size = 0')
        _thread_local.conn.execute('PRAGMA temp_store = 0')
    return _thread_local.conn

def get_kv_db():
    """Return a thread‑local Valkey client."""
    if not hasattr(_thread_local, 'kv_db'):
        _thread_local.kv_db = valkey.Valkey(connection_pool=kv_pool)
    return _thread_local.kv_db

def generate_hash():
    """Generate a random 10‑character hash."""
    return ''.join(
        random.choices(string.ascii_letters + string.digits, k=10)
    )

def create(link):
    """Insert a new link into SQLite and return its hash."""
    conn = get_db_connection()
    cur = conn.cursor()
    link_hash = generate_hash()
    cur.execute(
        'INSERT INTO links (link_hash, link_target) VALUES (?, ?)',
        (link_hash, link)
    )
    conn.commit()
    return link_hash

def get_link_by_hash(hash):
    """Retrieve a link by its hash, using Valkey as a cache when enabled."""
    if kv_cache:
        cached = get_kv_db().get(hash)
        if cached:
            return cached                     # Already decoded (decode_responses=True)

    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute(
        'SELECT link_target FROM links WHERE link_hash = ?',
        (hash,)
    )
    result = cur.fetchone()
    if result and result[0]:
        if kv_cache:
            get_kv_db().set(hash, result[0])
        return result[0]
    return None

注意:

在第 48 行 kv_cache = True 中，设置是否启用 Valkey 缓存。

禁用 SQLite 内部缓存的语句（PRAGMA cache_size = 0 和 PRAGMA temp_store = 0）会使 SQLite 不使用其页面缓存，从而可以观察外部缓存的影响。

压力测试脚本

下面的脚本使用多个线程生成大量请求。它会创建一个短链接，然后对该链接进行多次读取，以测量延迟。

import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median

# -------------------------------------------------
# Configuration
# -------------------------------------------------
BASE_URL = "http://localhost:5000"
NUM_TASKS = 100          # Total number of concurrent tasks
NUM_THREADS = 30         # Number of worker threads
NUM_LINK_GETS = 80       # GET requests per task


def create_test_link() -> str | None:
    """Create a short link via the API."""
    try:
        resp = requests.post(f"{BASE_URL}/link", json={"url": "https://example.com"})
        if resp.status_code == 200:
            return resp.json().get("short_link")
    except Exception as exc:
        print(f"Error creating test link: {exc}")
    return None


def test_link_retrieval() -> list[dict]:
    """Perform `NUM_LINK_GETS` GET requests for a single short link."""
    start = time.time()
    short_link = create_test_link()
    results = []

    for _ in range(NUM_LINK_GETS):
        try:
            resp = requests.get(f"{BASE_URL}/link/{short_link}")
            elapsed = time.time() - start
            results.append(
                {
                    "success": resp.status_code == 200,
                    "time": elapsed,
                    "status_code": resp.status_code,
                }
            )
        except Exception as exc:
            results.append(
                {
                    "success": False,
                    "time": time.time() - start,
                    "error": str(exc),
                }
            )
    return results


def run_stress_test() -> None:
    print(f"Total tasks: {NUM_TASKS}")
    print(f"Concurrent threads: {NUM_THREADS}")

    with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
        futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]

        all_results = []
        for future in as_completed(futures):
            all_results.extend(future.result())

    # -------------------------------------------------
    # Statistics
    # -------------------------------------------------
    times = [r["time"] for r in all_results if r["success"]]
    if times:
        print(f"Requests: {len(times)}")
        print(f"Mean latency: {mean(times):.4f}s")
        print(f"Median latency: {median(times):.4f}s")
        print(f"Min latency: {min(times):.4f}s")
        print(f"Max latency: {max(times):.4f}s")
    else:
        print("No successful requests recorded.")


if __name__ == "__main__":
    run_stress_test()

使用方法

启动 Flask 服务器（例如 python app.py）。
运行上面的脚本。
将启用和禁用 kv_cache 时的输出进行比较，以查看 Valkey 缓存的性能影响。

压力测试脚本（Python）

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median
from typing import List, Dict, Any

NUM_THREADS = 8   # Number of concurrent threads
NUM_TASKS = 1000   # Number of tasks to run


def test_link_retrieval() -> List[Dict[str, Any]]:
    """
    Simulate a request to the API endpoint that retrieves a link.

    Replace this stub with the actual request logic (e.g., using ``requests``).

    Returns
    -------
    list of dict
        Each dict should contain at least:
        - ``success`` (bool): whether the request succeeded
        - ``time`` (float): response time in seconds
        - optionally ``error`` (str) when ``success`` is ``False``
    """
    # TODO: implement real request
    # Example return value:
    # return [{'success': True, 'time': 0.123}, ...]
    return []


def run_stress_test() -> None:
    """Execute the stress test and print a summary of the results."""
    print(f"Current threads: {NUM_THREADS}\n")

    results: List[Dict[str, Any]] = []
    start_time = time.time()

    # ------------------------------------------------------------------
    # Execute the tasks concurrently
    # ------------------------------------------------------------------
    with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
        futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]

        for i, future in enumerate(as_completed(futures), start=1):
            subresults = future.result()
            results.extend(subresults)

            if i % 100 == 0:
                print(f"Completed: {i}/{NUM_TASKS}")

    total_time = time.time() - start_time

    # ------------------------------------------------------------------
    # Analysis
    # ------------------------------------------------------------------
    successful = [r for r in results if r.get('success')]
    failed = [r for r in results if not r.get('success')]
    response_times = [r['time'] for r in successful]

    print("\n" + "=" * 60)
    print("STRESS TEST RESULTS")
    print("=" * 60)
    print(f"Total tasks: {NUM_TASKS}")
    print(f"Successful: {len(successful)}")
    print(f"Failed: {len(failed)}")
    print(f"Total time: {total_time:.2f}s")
    print(f"Tasks/second: {NUM_TASKS / total_time:.2f}")

    if response_times:
        print("\nResponse times:")
        print(f"  Min:    {min(response_times) * 1000:.2f} ms")
        print(f"  Max:    {max(response_times) * 1000:.2f} ms")
        print(f"  Mean:   {mean(response_times) * 1000:.2f} ms")
        print(f"  Median: {median(response_times) * 1000:.2f} ms")

    if failed:
        print("\nErrors (first 5):")
        for r in failed[:5]:
            print(f"  - {r.get('error', 'Unknown error')}")

    print("=" * 60)


if __name__ == "__main__":
    run_stress_test()

测试结果（本地环境）

注意： 测试在单台本地服务器上进行，未禁用 SQLite 的内存缓存。即便如此，SQLite 在有无 Valkey（Redis）缓存的情况下响应速度几乎相同。

SQLite 3 – Without Valkey (Redis) Cache

============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 4.57s
Tasks/second: 21.89

Response times:
  Min: 10.92ms
  Max: 1534.86ms
  Mean: 664.45ms
  Median: 627.76ms
============================================================

SQLite 3 – With Valkey (Redis) Cache

============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 5.24s
Tasks/second: 19.08

Response times:
  Min: 25.95ms
  Max: 1896.03ms
  Mean: 783.95ms
  Median: 747.06ms
============================================================

即使考虑到访问 Valkey（Redis）所需的网络开销，SQLite 的性能依然令人印象深刻。

观察

SQLite 通常不用于大规模生产工作负载，因此这些本地结果并不能完全代表在需要专用缓存层的真实场景。
禁用 SQLite 的内部缓存仍然会保留操作系统的文件系统缓存，这可能掩盖像 Valkey 这样的外部缓存的真实影响。
最有趣的收获是 OS 级别缓存和 SQLite 自身优化对性能的影响，而不是使用 Valkey 进行缓存的概念验证。

Source Code

完整的概念验证（proof‑of‑concept）源代码可在以下位置找到：

memcache‑poc on GitHub

SQLite 中缓存的效率

概览

为什么缓存很重要？

使用的技术

服务器路由（端点）

注意事项

应用核心（主文件）

压力测试脚本

使用方法

压力测试脚本（Python）

测试结果（本地环境）

SQLite 3 – Without Valkey (Redis) Cache

SQLite 3 – With Valkey (Redis) Cache

观察

Source Code

相关文章

优化高流量 Node.js API 的顶级策略

Python的秘密生活：懒惰的面包师

SQLite 中的系统事务到用户事务

Python：Tprof，目标分析器

概览

为什么缓存很重要？

使用的技术

服务器路由（端点）

注意事项

应用核心（主文件）

压力测试脚本

使用方法

压力测试脚本（Python）

测试结果（本地环境）

SQLite 3 – Without Valkey (Redis) Cache

SQLite 3 – With Valkey (Redis) Cache

观察

Source Code

相关文章

优化高流量 Node.js API 的顶级策略

Python的秘密生活：懒惰的面包师

SQLite 中的系统事务到用户事务

Python：Tprof，目标分析器

SQLite 3 – Without Valkey (Redis) Cache

SQLite 3 – With Valkey (Redis) Cache