SQLite 中缓存的效率

发布: (2026年1月19日 GMT+8 07:23)
10 min read
原文: Dev.to

Source: Dev.to

A Eficiência do Cache no SQLite 的封面图片

概览

  • 目标: 比较 NoSQL 缓存(Valkey)与 SQLite 内部缓存的读取速度。
  • 应用: 使用 Python + Flask 开发的 REST JSON Web 服务
  • 端点:
    1. 缩短 URL – 接收一个长 URL 并返回短版本。
    2. 解析短 URL – 根据短版本返回原始 URL。

为什么缓存很重要?

创建的缓存用于读取先前缩短的 URL。当以结构化方式存储 URL 的数据库不需要在每次请求时去文件中查找时,Web 服务器可以更高效地响应。

使用的技术

技术
Web 框架Python + Flask
NoSQL 缓存Valkey(Redis 的开源版本)
数据库SQLite3(带内部缓存功能)

服务器路由(端点)

from flask import Flask, request, jsonify
from business.link_shortener import generate_new_link, get_link_by_hash_string

app = Flask(__name__)

# ----------------------------------------------------------------------
#  Health‑check
# ----------------------------------------------------------------------
@app.route("/heart-beat", methods=["GET"])
def heart_beat() -> dict:
    """Endpoint simples para verificar se a aplicação está rodando."""
    return {"OK": True}


# ----------------------------------------------------------------------
#  Criação de link encurtado
# ----------------------------------------------------------------------
@app.route("/link", methods=["POST"])
def create_link() -> dict:
    """
    Recebe um JSON no formato:
        { "url": "https://exemplo.com/alguma/pagina" }

    Retorna:
        { "short_link": "<hash>" }
    """
    data = request.get_json(silent=True) or {}
    url = data.get("url")

    if not url:
        return jsonify({"error": "Campo 'url' é obrigatório."}), 400

    short_link = generate_new_link(url)
    return {"short_link": short_link}, 201


# ----------------------------------------------------------------------
#  Resgate de link original a partir do hash
# ----------------------------------------------------------------------
@app.route("/link/<string:short_link>", methods=["GET"])
def get_original_link(short_link: str) -> dict:
    """
    Busca o URL original associado ao hash informado.

    Exemplo de chamada:
        GET /link/abc123

    Retorna:
        { "link": "https://exemplo.com/alguma/pagina" }
    """
    original_url = get_link_by_hash_string(short_link)

    if not original_url:
        return jsonify({"error": "Link não encontrado."}), 404

    return {"link": original_url}, 200

注意事项

项目备注
Importaçãojsonify 已添加,以便更轻松地创建带有状态码的 JSON 响应。
Rotas每个路由都有一个文档块(docstring),描述其目的、预期参数和响应格式。
Validação在创建链接(POST /link)时会检查是否提供了 url 字段;否则返回 400。
Parâmetro de caminho读取端点(GET /link/<short_link>)将哈希作为 URL 的一部分接收,避免了查询字符串或请求体的需求。
Códigos de status成功创建返回 201,错误请求返回 400,哈希不存在返回 404。
Nomes de funções函数已重命名(heart_beatcreate_linkget_original_link),以避免冲突并提升可读性。
Separadores注释行(# ----------------------------------------------------------------------)有助于在视觉上组织代码。
Tipagem已加入类型注解(-> dictshort_link: str),以提升自动补全和文档质量。
Segurançarequest.get_json(silent=True) 可防止在请求体不是有效 JSON 时抛出异常。
Consistência所有响应均以字典形式返回(或通过 jsonify),让 Flask 自动转换为 JSON。
Teste rápidobash\ncurl -X POST -H "Content-Type: application/json" -d '{"url":"https://exemplo.com"}' http://localhost:5000/link\n

| | **执行** | 保存文件为app.py并运行flask run(或在存在 if name == “main”: app.run()时运行python app.py`)。 |

应用核心(主文件)

import sqlite3
import random
import string
import threading
import valkey
import json

# Thread‑local storage for database connections
_thread_local = threading.local()

# Flag that activates the key‑value database cache
kv_cache = True

# Global connection pool for Valkey
kv_pool = valkey.ConnectionPool(
    host='localhost',
    port=6379,
    db=0,
    max_connections=50,          # Match the number of concurrent threads
    socket_keepalive=True,
    socket_connect_timeout=5,
    retry_on_timeout=True,
    decode_responses=True       # Automatically decode responses to strings
)

# ----------------------------------------------------------------------
# Initialize database schema
# ----------------------------------------------------------------------
conn = sqlite3.connect('database.db')
cur = conn.cursor()
cur.execute('''
    CREATE TABLE IF NOT EXISTS links (
        id INTEGER PRIMARY KEY,
        link_hash TEXT,
        link_target TEXT
    )
''')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_id ON links (id)')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_link_hash ON links (link_hash)')
conn.commit()
conn.close()

def get_db_connection():
    """Return a thread‑local SQLite connection."""
    if not hasattr(_thread_local, 'conn'):
        _thread_local.conn = sqlite3.connect(
            'database.db',
            check_same_thread=False
        )
        # Disable SQLite's internal page cache and force temp files on disk
        _thread_local.conn.execute('PRAGMA cache_size = 0')
        _thread_local.conn.execute('PRAGMA temp_store = 0')
    return _thread_local.conn

def get_kv_db():
    """Return a thread‑local Valkey client."""
    if not hasattr(_thread_local, 'kv_db'):
        _thread_local.kv_db = valkey.Valkey(connection_pool=kv_pool)
    return _thread_local.kv_db

def generate_hash():
    """Generate a random 10‑character hash."""
    return ''.join(
        random.choices(string.ascii_letters + string.digits, k=10)
    )

def create(link):
    """Insert a new link into SQLite and return its hash."""
    conn = get_db_connection()
    cur = conn.cursor()
    link_hash = generate_hash()
    cur.execute(
        'INSERT INTO links (link_hash, link_target) VALUES (?, ?)',
        (link_hash, link)
    )
    conn.commit()
    return link_hash

def get_link_by_hash(hash):
    """Retrieve a link by its hash, using Valkey as a cache when enabled."""
    if kv_cache:
        cached = get_kv_db().get(hash)
        if cached:
            return cached                     # Already decoded (decode_responses=True)

    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute(
        'SELECT link_target FROM links WHERE link_hash = ?',
        (hash,)
    )
    result = cur.fetchone()
    if result and result[0]:
        if kv_cache:
            get_kv_db().set(hash, result[0])
        return result[0]
    return None

注意:

  • 在第 48 行 kv_cache = True 中,设置是否启用 Valkey 缓存。
  • 禁用 SQLite 内部缓存的语句(PRAGMA cache_size = 0PRAGMA temp_store = 0)会使 SQLite 不使用其页面缓存,从而可以观察外部缓存的影响。

压力测试脚本

下面的脚本使用多个线程生成大量请求。它会创建一个短链接,然后对该链接进行多次读取,以测量延迟。

import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median

# -------------------------------------------------
# Configuration
# -------------------------------------------------
BASE_URL = "http://localhost:5000"
NUM_TASKS = 100          # Total number of concurrent tasks
NUM_THREADS = 30         # Number of worker threads
NUM_LINK_GETS = 80       # GET requests per task


def create_test_link() -> str | None:
    """Create a short link via the API."""
    try:
        resp = requests.post(f"{BASE_URL}/link", json={"url": "https://example.com"})
        if resp.status_code == 200:
            return resp.json().get("short_link")
    except Exception as exc:
        print(f"Error creating test link: {exc}")
    return None


def test_link_retrieval() -> list[dict]:
    """Perform `NUM_LINK_GETS` GET requests for a single short link."""
    start = time.time()
    short_link = create_test_link()
    results = []

    for _ in range(NUM_LINK_GETS):
        try:
            resp = requests.get(f"{BASE_URL}/link/{short_link}")
            elapsed = time.time() - start
            results.append(
                {
                    "success": resp.status_code == 200,
                    "time": elapsed,
                    "status_code": resp.status_code,
                }
            )
        except Exception as exc:
            results.append(
                {
                    "success": False,
                    "time": time.time() - start,
                    "error": str(exc),
                }
            )
    return results


def run_stress_test() -> None:
    print(f"Total tasks: {NUM_TASKS}")
    print(f"Concurrent threads: {NUM_THREADS}")

    with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
        futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]

        all_results = []
        for future in as_completed(futures):
            all_results.extend(future.result())

    # -------------------------------------------------
    # Statistics
    # -------------------------------------------------
    times = [r["time"] for r in all_results if r["success"]]
    if times:
        print(f"Requests: {len(times)}")
        print(f"Mean latency: {mean(times):.4f}s")
        print(f"Median latency: {median(times):.4f}s")
        print(f"Min latency: {min(times):.4f}s")
        print(f"Max latency: {max(times):.4f}s")
    else:
        print("No successful requests recorded.")


if __name__ == "__main__":
    run_stress_test()

使用方法

  1. 启动 Flask 服务器(例如 python app.py)。
  2. 运行上面的脚本。
  3. 将启用和禁用 kv_cache 时的输出进行比较,以查看 Valkey 缓存的性能影响。

压力测试脚本(Python)

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median
from typing import List, Dict, Any

NUM_THREADS = 8   # Number of concurrent threads
NUM_TASKS = 1000   # Number of tasks to run


def test_link_retrieval() -> List[Dict[str, Any]]:
    """
    Simulate a request to the API endpoint that retrieves a link.

    Replace this stub with the actual request logic (e.g., using ``requests``).

    Returns
    -------
    list of dict
        Each dict should contain at least:
        - ``success`` (bool): whether the request succeeded
        - ``time`` (float): response time in seconds
        - optionally ``error`` (str) when ``success`` is ``False``
    """
    # TODO: implement real request
    # Example return value:
    # return [{'success': True, 'time': 0.123}, ...]
    return []


def run_stress_test() -> None:
    """Execute the stress test and print a summary of the results."""
    print(f"Current threads: {NUM_THREADS}\n")

    results: List[Dict[str, Any]] = []
    start_time = time.time()

    # ------------------------------------------------------------------
    # Execute the tasks concurrently
    # ------------------------------------------------------------------
    with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
        futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]

        for i, future in enumerate(as_completed(futures), start=1):
            subresults = future.result()
            results.extend(subresults)

            if i % 100 == 0:
                print(f"Completed: {i}/{NUM_TASKS}")

    total_time = time.time() - start_time

    # ------------------------------------------------------------------
    # Analysis
    # ------------------------------------------------------------------
    successful = [r for r in results if r.get('success')]
    failed = [r for r in results if not r.get('success')]
    response_times = [r['time'] for r in successful]

    print("\n" + "=" * 60)
    print("STRESS TEST RESULTS")
    print("=" * 60)
    print(f"Total tasks: {NUM_TASKS}")
    print(f"Successful: {len(successful)}")
    print(f"Failed: {len(failed)}")
    print(f"Total time: {total_time:.2f}s")
    print(f"Tasks/second: {NUM_TASKS / total_time:.2f}")

    if response_times:
        print("\nResponse times:")
        print(f"  Min:    {min(response_times) * 1000:.2f} ms")
        print(f"  Max:    {max(response_times) * 1000:.2f} ms")
        print(f"  Mean:   {mean(response_times) * 1000:.2f} ms")
        print(f"  Median: {median(response_times) * 1000:.2f} ms")

    if failed:
        print("\nErrors (first 5):")
        for r in failed[:5]:
            print(f"  - {r.get('error', 'Unknown error')}")

    print("=" * 60)


if __name__ == "__main__":
    run_stress_test()

测试结果(本地环境)

注意: 测试在单台本地服务器上进行,未禁用 SQLite 的内存缓存。即便如此,SQLite 在有无 Valkey(Redis)缓存的情况下响应速度几乎相同。

SQLite 3 – Without Valkey (Redis) Cache

============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 4.57s
Tasks/second: 21.89

Response times:
  Min: 10.92ms
  Max: 1534.86ms
  Mean: 664.45ms
  Median: 627.76ms
============================================================

SQLite 3 – With Valkey (Redis) Cache

============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 5.24s
Tasks/second: 19.08

Response times:
  Min: 25.95ms
  Max: 1896.03ms
  Mean: 783.95ms
  Median: 747.06ms
============================================================

即使考虑到访问 Valkey(Redis)所需的网络开销,SQLite 的性能依然令人印象深刻。

观察

  • SQLite 通常不用于大规模生产工作负载,因此这些本地结果并不能完全代表在需要专用缓存层的真实场景。
  • 禁用 SQLite 的内部缓存仍然会保留操作系统的文件系统缓存,这可能掩盖像 Valkey 这样的外部缓存的真实影响。
  • 最有趣的收获是 OS 级别缓存和 SQLite 自身优化对性能的影响,而不是使用 Valkey 进行缓存的概念验证。

Source Code

完整的概念验证(proof‑of‑concept)源代码可在以下位置找到:

memcache‑poc on GitHub

Back to Blog

相关文章

阅读更多 »