SQLite 中缓存的效率
发布: (2026年1月19日 GMT+8 07:23)
10 min read
原文: Dev.to
Source: Dev.to

概览
- 目标: 比较 NoSQL 缓存(Valkey)与 SQLite 内部缓存的读取速度。
- 应用: 使用 Python + Flask 开发的 REST JSON Web 服务。
- 端点:
- 缩短 URL – 接收一个长 URL 并返回短版本。
- 解析短 URL – 根据短版本返回原始 URL。
为什么缓存很重要?
创建的缓存用于读取先前缩短的 URL。当以结构化方式存储 URL 的数据库不需要在每次请求时去文件中查找时,Web 服务器可以更高效地响应。
使用的技术
| 层 | 技术 |
|---|---|
| Web 框架 | Python + Flask |
| NoSQL 缓存 | Valkey(Redis 的开源版本) |
| 数据库 | SQLite3(带内部缓存功能) |
服务器路由(端点)
from flask import Flask, request, jsonify
from business.link_shortener import generate_new_link, get_link_by_hash_string
app = Flask(__name__)
# ----------------------------------------------------------------------
# Health‑check
# ----------------------------------------------------------------------
@app.route("/heart-beat", methods=["GET"])
def heart_beat() -> dict:
"""Endpoint simples para verificar se a aplicação está rodando."""
return {"OK": True}
# ----------------------------------------------------------------------
# Criação de link encurtado
# ----------------------------------------------------------------------
@app.route("/link", methods=["POST"])
def create_link() -> dict:
"""
Recebe um JSON no formato:
{ "url": "https://exemplo.com/alguma/pagina" }
Retorna:
{ "short_link": "<hash>" }
"""
data = request.get_json(silent=True) or {}
url = data.get("url")
if not url:
return jsonify({"error": "Campo 'url' é obrigatório."}), 400
short_link = generate_new_link(url)
return {"short_link": short_link}, 201
# ----------------------------------------------------------------------
# Resgate de link original a partir do hash
# ----------------------------------------------------------------------
@app.route("/link/<string:short_link>", methods=["GET"])
def get_original_link(short_link: str) -> dict:
"""
Busca o URL original associado ao hash informado.
Exemplo de chamada:
GET /link/abc123
Retorna:
{ "link": "https://exemplo.com/alguma/pagina" }
"""
original_url = get_link_by_hash_string(short_link)
if not original_url:
return jsonify({"error": "Link não encontrado."}), 404
return {"link": original_url}, 200
注意事项
| 项目 | 备注 |
|---|---|
| Importação | jsonify 已添加,以便更轻松地创建带有状态码的 JSON 响应。 |
| Rotas | 每个路由都有一个文档块(docstring),描述其目的、预期参数和响应格式。 |
| Validação | 在创建链接(POST /link)时会检查是否提供了 url 字段;否则返回 400。 |
| Parâmetro de caminho | 读取端点(GET /link/<short_link>)将哈希作为 URL 的一部分接收,避免了查询字符串或请求体的需求。 |
| Códigos de status | 成功创建返回 201,错误请求返回 400,哈希不存在返回 404。 |
| Nomes de funções | 函数已重命名(heart_beat、create_link、get_original_link),以避免冲突并提升可读性。 |
| Separadores | 注释行(# ----------------------------------------------------------------------)有助于在视觉上组织代码。 |
| Tipagem | 已加入类型注解(-> dict、short_link: str),以提升自动补全和文档质量。 |
| Segurança | request.get_json(silent=True) 可防止在请求体不是有效 JSON 时抛出异常。 |
| Consistência | 所有响应均以字典形式返回(或通过 jsonify),让 Flask 自动转换为 JSON。 |
| Teste rápido | bash\ncurl -X POST -H "Content-Type: application/json" -d '{"url":"https://exemplo.com"}' http://localhost:5000/link\n |
| | **执行** | 保存文件为app.py并运行flask run(或在存在 if name == “main”: app.run()时运行python app.py`)。 |
应用核心(主文件)
import sqlite3
import random
import string
import threading
import valkey
import json
# Thread‑local storage for database connections
_thread_local = threading.local()
# Flag that activates the key‑value database cache
kv_cache = True
# Global connection pool for Valkey
kv_pool = valkey.ConnectionPool(
host='localhost',
port=6379,
db=0,
max_connections=50, # Match the number of concurrent threads
socket_keepalive=True,
socket_connect_timeout=5,
retry_on_timeout=True,
decode_responses=True # Automatically decode responses to strings
)
# ----------------------------------------------------------------------
# Initialize database schema
# ----------------------------------------------------------------------
conn = sqlite3.connect('database.db')
cur = conn.cursor()
cur.execute('''
CREATE TABLE IF NOT EXISTS links (
id INTEGER PRIMARY KEY,
link_hash TEXT,
link_target TEXT
)
''')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_id ON links (id)')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_link_hash ON links (link_hash)')
conn.commit()
conn.close()
def get_db_connection():
"""Return a thread‑local SQLite connection."""
if not hasattr(_thread_local, 'conn'):
_thread_local.conn = sqlite3.connect(
'database.db',
check_same_thread=False
)
# Disable SQLite's internal page cache and force temp files on disk
_thread_local.conn.execute('PRAGMA cache_size = 0')
_thread_local.conn.execute('PRAGMA temp_store = 0')
return _thread_local.conn
def get_kv_db():
"""Return a thread‑local Valkey client."""
if not hasattr(_thread_local, 'kv_db'):
_thread_local.kv_db = valkey.Valkey(connection_pool=kv_pool)
return _thread_local.kv_db
def generate_hash():
"""Generate a random 10‑character hash."""
return ''.join(
random.choices(string.ascii_letters + string.digits, k=10)
)
def create(link):
"""Insert a new link into SQLite and return its hash."""
conn = get_db_connection()
cur = conn.cursor()
link_hash = generate_hash()
cur.execute(
'INSERT INTO links (link_hash, link_target) VALUES (?, ?)',
(link_hash, link)
)
conn.commit()
return link_hash
def get_link_by_hash(hash):
"""Retrieve a link by its hash, using Valkey as a cache when enabled."""
if kv_cache:
cached = get_kv_db().get(hash)
if cached:
return cached # Already decoded (decode_responses=True)
conn = get_db_connection()
cur = conn.cursor()
cur.execute(
'SELECT link_target FROM links WHERE link_hash = ?',
(hash,)
)
result = cur.fetchone()
if result and result[0]:
if kv_cache:
get_kv_db().set(hash, result[0])
return result[0]
return None
注意:
- 在第 48 行
kv_cache = True中,设置是否启用 Valkey 缓存。- 禁用 SQLite 内部缓存的语句(
PRAGMA cache_size = 0和PRAGMA temp_store = 0)会使 SQLite 不使用其页面缓存,从而可以观察外部缓存的影响。
压力测试脚本
下面的脚本使用多个线程生成大量请求。它会创建一个短链接,然后对该链接进行多次读取,以测量延迟。
import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median
# -------------------------------------------------
# Configuration
# -------------------------------------------------
BASE_URL = "http://localhost:5000"
NUM_TASKS = 100 # Total number of concurrent tasks
NUM_THREADS = 30 # Number of worker threads
NUM_LINK_GETS = 80 # GET requests per task
def create_test_link() -> str | None:
"""Create a short link via the API."""
try:
resp = requests.post(f"{BASE_URL}/link", json={"url": "https://example.com"})
if resp.status_code == 200:
return resp.json().get("short_link")
except Exception as exc:
print(f"Error creating test link: {exc}")
return None
def test_link_retrieval() -> list[dict]:
"""Perform `NUM_LINK_GETS` GET requests for a single short link."""
start = time.time()
short_link = create_test_link()
results = []
for _ in range(NUM_LINK_GETS):
try:
resp = requests.get(f"{BASE_URL}/link/{short_link}")
elapsed = time.time() - start
results.append(
{
"success": resp.status_code == 200,
"time": elapsed,
"status_code": resp.status_code,
}
)
except Exception as exc:
results.append(
{
"success": False,
"time": time.time() - start,
"error": str(exc),
}
)
return results
def run_stress_test() -> None:
print(f"Total tasks: {NUM_TASKS}")
print(f"Concurrent threads: {NUM_THREADS}")
with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]
all_results = []
for future in as_completed(futures):
all_results.extend(future.result())
# -------------------------------------------------
# Statistics
# -------------------------------------------------
times = [r["time"] for r in all_results if r["success"]]
if times:
print(f"Requests: {len(times)}")
print(f"Mean latency: {mean(times):.4f}s")
print(f"Median latency: {median(times):.4f}s")
print(f"Min latency: {min(times):.4f}s")
print(f"Max latency: {max(times):.4f}s")
else:
print("No successful requests recorded.")
if __name__ == "__main__":
run_stress_test()
使用方法
- 启动 Flask 服务器(例如
python app.py)。 - 运行上面的脚本。
- 将启用和禁用 kv_cache 时的输出进行比较,以查看 Valkey 缓存的性能影响。
压力测试脚本(Python)
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median
from typing import List, Dict, Any
NUM_THREADS = 8 # Number of concurrent threads
NUM_TASKS = 1000 # Number of tasks to run
def test_link_retrieval() -> List[Dict[str, Any]]:
"""
Simulate a request to the API endpoint that retrieves a link.
Replace this stub with the actual request logic (e.g., using ``requests``).
Returns
-------
list of dict
Each dict should contain at least:
- ``success`` (bool): whether the request succeeded
- ``time`` (float): response time in seconds
- optionally ``error`` (str) when ``success`` is ``False``
"""
# TODO: implement real request
# Example return value:
# return [{'success': True, 'time': 0.123}, ...]
return []
def run_stress_test() -> None:
"""Execute the stress test and print a summary of the results."""
print(f"Current threads: {NUM_THREADS}\n")
results: List[Dict[str, Any]] = []
start_time = time.time()
# ------------------------------------------------------------------
# Execute the tasks concurrently
# ------------------------------------------------------------------
with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]
for i, future in enumerate(as_completed(futures), start=1):
subresults = future.result()
results.extend(subresults)
if i % 100 == 0:
print(f"Completed: {i}/{NUM_TASKS}")
total_time = time.time() - start_time
# ------------------------------------------------------------------
# Analysis
# ------------------------------------------------------------------
successful = [r for r in results if r.get('success')]
failed = [r for r in results if not r.get('success')]
response_times = [r['time'] for r in successful]
print("\n" + "=" * 60)
print("STRESS TEST RESULTS")
print("=" * 60)
print(f"Total tasks: {NUM_TASKS}")
print(f"Successful: {len(successful)}")
print(f"Failed: {len(failed)}")
print(f"Total time: {total_time:.2f}s")
print(f"Tasks/second: {NUM_TASKS / total_time:.2f}")
if response_times:
print("\nResponse times:")
print(f" Min: {min(response_times) * 1000:.2f} ms")
print(f" Max: {max(response_times) * 1000:.2f} ms")
print(f" Mean: {mean(response_times) * 1000:.2f} ms")
print(f" Median: {median(response_times) * 1000:.2f} ms")
if failed:
print("\nErrors (first 5):")
for r in failed[:5]:
print(f" - {r.get('error', 'Unknown error')}")
print("=" * 60)
if __name__ == "__main__":
run_stress_test()
测试结果(本地环境)
注意: 测试在单台本地服务器上进行,未禁用 SQLite 的内存缓存。即便如此,SQLite 在有无 Valkey(Redis)缓存的情况下响应速度几乎相同。
SQLite 3 – Without Valkey (Redis) Cache
============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 4.57s
Tasks/second: 21.89
Response times:
Min: 10.92ms
Max: 1534.86ms
Mean: 664.45ms
Median: 627.76ms
============================================================
SQLite 3 – With Valkey (Redis) Cache
============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 5.24s
Tasks/second: 19.08
Response times:
Min: 25.95ms
Max: 1896.03ms
Mean: 783.95ms
Median: 747.06ms
============================================================
即使考虑到访问 Valkey(Redis)所需的网络开销,SQLite 的性能依然令人印象深刻。
观察
- SQLite 通常不用于大规模生产工作负载,因此这些本地结果并不能完全代表在需要专用缓存层的真实场景。
- 禁用 SQLite 的内部缓存仍然会保留操作系统的文件系统缓存,这可能掩盖像 Valkey 这样的外部缓存的真实影响。
- 最有趣的收获是 OS 级别缓存和 SQLite 自身优化对性能的影响,而不是使用 Valkey 进行缓存的概念验证。
Source Code
完整的概念验证(proof‑of‑concept)源代码可在以下位置找到: