A Eficiência do Cache no SQLite
Source: Dev.to

Visão geral
- Objetivo: comparar a velocidade de leitura entre um cache NoSQL (Valkey) e o cache interno do SQLite.
- Aplicação: um webservice REST JSON desenvolvido com Python + Flask.
- Endpoints:
- Encurtar URL – recebe uma URL longa e devolve uma versão curta.
- Resolver URL curta – devolve a URL original a partir da versão curta.
Por que o cache importa?
O cache criado serve para a leitura da URL previamente encurtada. Quando o banco de dados que armazena a URL de forma estruturada não precisa buscar em arquivos a cada requisição, o servidor web pode responder de maneira muito mais eficiente.
Tecnologias usadas
| Camada | Tecnologia |
|---|---|
| Web framework | Python + Flask |
| Cache NoSQL | Valkey (versão open‑source do Redis) |
| Banco de dados | SQLite3 (com recurso de cache interno) |
Rotas (endpoints) do servidor
from flask import Flask, request, jsonify
from business.link_shortener import generate_new_link, get_link_by_hash_string
app = Flask(__name__)
# ----------------------------------------------------------------------
# Health‑check
# ----------------------------------------------------------------------
@app.route("/heart-beat", methods=["GET"])
def heart_beat() -> dict:
"""Endpoint simples para verificar se a aplicação está rodando."""
return {"OK": True}
# ----------------------------------------------------------------------
# Criação de link encurtado
# ----------------------------------------------------------------------
@app.route("/link", methods=["POST"])
def create_link() -> dict:
"""
Recebe um JSON no formato:
{ "url": "https://exemplo.com/alguma/pagina" }
Retorna:
{ "short_link": "<hash>" }
"""
data = request.get_json(silent=True) or {}
url = data.get("url")
if not url:
return jsonify({"error": "Campo 'url' é obrigatório."}), 400
short_link = generate_new_link(url)
return {"short_link": short_link}, 201
# ----------------------------------------------------------------------
# Resgate de link original a partir do hash
# ----------------------------------------------------------------------
@app.route("/link/<string:short_link>", methods=["GET"])
def get_original_link(short_link: str) -> dict:
"""
Busca o URL original associado ao hash informado.
Exemplo de chamada:
GET /link/abc123
Retorna:
{ "link": "https://exemplo.com/alguma/pagina" }
"""
original_url = get_link_by_hash_string(short_link)
if not original_url:
return jsonify({"error": "Link não encontrado."}), 404
return {"link": original_url}, 200
Observações
| Item | Comentário |
|---|---|
| Importação | jsonify foi adicionado para facilitar a criação de respostas JSON com códigos de status. |
| Rotas | Cada rota tem um bloco de documentação (docstring) que descreve seu propósito, parâmetros esperados e formato da resposta. |
| Validação | Na criação de link (POST /link) verifica‑se se o campo url foi enviado; caso contrário, retorna 400. |
| Parâmetro de caminho | O endpoint de leitura (GET /link/<short_link>) recebe o hash como parte da URL, evitando a necessidade de query string ou corpo. |
| Códigos de status | 201 para criação bem‑sucedida, 400 para requisição inválida e 404 quando o hash não existe. |
| Nomes de funções | Funções renomeadas (heart_beat, create_link, get_original_link) para evitar colisões e melhorar a legibilidade. |
| Separadores | Linhas de comentário (# ----------------------------------------------------------------------) ajudam a organizar visualmente o código. |
| Tipagem | Anotações de tipo (-> dict, short_link: str) foram incluídas para melhorar a autocompletação e a documentação. |
| Segurança | request.get_json(silent=True) impede exceções caso o corpo da requisição não seja JSON válido. |
| Consistência | Todas as respostas são retornadas como dicionários (ou via jsonify) para que o Flask converta automaticamente em JSON. |
| Teste rápido | bash\ncurl -X POST -H "Content-Type: application/json" -d '{"url":"https://exemplo.com"}' http://localhost:5000/link\n |
| Execução | Salve o arquivo como app.py e execute flask run (ou python app.py se houver if __name__ == "__main__": app.run()). |
Core da aplicação (arquivo principal)
import sqlite3
import random
import string
import threading
import valkey
import json
# Thread‑local storage for database connections
_thread_local = threading.local()
# Flag that activates the key‑value database cache
kv_cache = True
# Global connection pool for Valkey
kv_pool = valkey.ConnectionPool(
host='localhost',
port=6379,
db=0,
max_connections=50, # Match the number of concurrent threads
socket_keepalive=True,
socket_connect_timeout=5,
retry_on_timeout=True,
decode_responses=True # Automatically decode responses to strings
)
# ----------------------------------------------------------------------
# Initialize database schema
# ----------------------------------------------------------------------
conn = sqlite3.connect('database.db')
cur = conn.cursor()
cur.execute('''
CREATE TABLE IF NOT EXISTS links (
id INTEGER PRIMARY KEY,
link_hash TEXT,
link_target TEXT
)
''')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_id ON links (id)')
cur.execute('CREATE INDEX IF NOT EXISTS idx_links_link_hash ON links (link_hash)')
conn.commit()
conn.close()
def get_db_connection():
"""Return a thread‑local SQLite connection."""
if not hasattr(_thread_local, 'conn'):
_thread_local.conn = sqlite3.connect(
'database.db',
check_same_thread=False
)
# Disable SQLite's internal page cache and force temp files on disk
_thread_local.conn.execute('PRAGMA cache_size = 0')
_thread_local.conn.execute('PRAGMA temp_store = 0')
return _thread_local.conn
def get_kv_db():
"""Return a thread‑local Valkey client."""
if not hasattr(_thread_local, 'kv_db'):
_thread_local.kv_db = valkey.Valkey(connection_pool=kv_pool)
return _thread_local.kv_db
def generate_hash():
"""Generate a random 10‑character hash."""
return ''.join(
random.choices(string.ascii_letters + string.digits, k=10)
)
def create(link):
"""Insert a new link into SQLite and return its hash."""
conn = get_db_connection()
cur = conn.cursor()
link_hash = generate_hash()
cur.execute(
'INSERT INTO links (link_hash, link_target) VALUES (?, ?)',
(link_hash, link)
)
conn.commit()
return link_hash
def get_link_by_hash(hash):
"""Retrieve a link by its hash, using Valkey as a cache when enabled."""
if kv_cache:
cached = get_kv_db().get(hash)
if cached:
return cached # Already decoded (decode_responses=True)
conn = get_db_connection()
cur = conn.cursor()
cur.execute(
'SELECT link_target FROM links WHERE link_hash = ?',
(hash,)
)
result = cur.fetchone()
if result and result[0]:
if kv_cache:
get_kv_db().set(hash, result[0])
return result[0]
return None
Observação:
- Na linha
kv_cache = True(linha 48) configura‑se se o cache Valkey será ativo.- As linhas que desativam o cache interno do SQLite (
PRAGMA cache_size = 0ePRAGMA temp_store = 0) fazem com que o SQLite não use seu próprio cache de página, permitindo observar o impacto do cache externo.
Stress Test Script
The script below generates heavy requests using multiple threads. It creates a short link and then performs several reads of that link to measure latency.
import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median
# -------------------------------------------------
# Configuration
# -------------------------------------------------
BASE_URL = "http://localhost:5000"
NUM_TASKS = 100 # Total number of concurrent tasks
NUM_THREADS = 30 # Number of worker threads
NUM_LINK_GETS = 80 # GET requests per task
def create_test_link() -> str | None:
"""Create a short link via the API."""
try:
resp = requests.post(f"{BASE_URL}/link", json={"url": "https://example.com"})
if resp.status_code == 200:
return resp.json().get("short_link")
except Exception as exc:
print(f"Error creating test link: {exc}")
return None
def test_link_retrieval() -> list[dict]:
"""Perform `NUM_LINK_GETS` GET requests for a single short link."""
start = time.time()
short_link = create_test_link()
results = []
for _ in range(NUM_LINK_GETS):
try:
resp = requests.get(f"{BASE_URL}/link/{short_link}")
elapsed = time.time() - start
results.append(
{
"success": resp.status_code == 200,
"time": elapsed,
"status_code": resp.status_code,
}
)
except Exception as exc:
results.append(
{
"success": False,
"time": time.time() - start,
"error": str(exc),
}
)
return results
def run_stress_test() -> None:
print(f"Total tasks: {NUM_TASKS}")
print(f"Concurrent threads: {NUM_THREADS}")
with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]
all_results = []
for future in as_completed(futures):
all_results.extend(future.result())
# -------------------------------------------------
# Statistics
# -------------------------------------------------
times = [r["time"] for r in all_results if r["success"]]
if times:
print(f"Requests: {len(times)}")
print(f"Mean latency: {mean(times):.4f}s")
print(f"Median latency: {median(times):.4f}s")
print(f"Min latency: {min(times):.4f}s")
print(f"Max latency: {max(times):.4f}s")
else:
print("No successful requests recorded.")
if __name__ == "__main__":
run_stress_test()
How to use
- Start the Flask server (e.g.,
python app.py). - Run the script above.
- Compare the output with kv_cache enabled and disabled to see the performance impact of the Valkey cache.
Stress Test Script (Python)
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from statistics import mean, median
from typing import List, Dict, Any
NUM_THREADS = 8 # Number of concurrent threads
NUM_TASKS = 1000 # Number of tasks to run
def test_link_retrieval() -> List[Dict[str, Any]]:
"""
Simulate a request to the API endpoint that retrieves a link.
Replace this stub with the actual request logic (e.g., using ``requests``).
Returns
-------
list of dict
Each dict should contain at least:
- ``success`` (bool): whether the request succeeded
- ``time`` (float): response time in seconds
- optionally ``error`` (str) when ``success`` is ``False``
"""
# TODO: implement real request
# Example return value:
# return [{'success': True, 'time': 0.123}, ...]
return []
def run_stress_test() -> None:
"""Execute the stress test and print a summary of the results."""
print(f"Current threads: {NUM_THREADS}\n")
results: List[Dict[str, Any]] = []
start_time = time.time()
# ------------------------------------------------------------------
# Execute the tasks concurrently
# ------------------------------------------------------------------
with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
futures = [executor.submit(test_link_retrieval) for _ in range(NUM_TASKS)]
for i, future in enumerate(as_completed(futures), start=1):
subresults = future.result()
results.extend(subresults)
if i % 100 == 0:
print(f"Completed: {i}/{NUM_TASKS}")
total_time = time.time() - start_time
# ------------------------------------------------------------------
# Analysis
# ------------------------------------------------------------------
successful = [r for r in results if r.get('success')]
failed = [r for r in results if not r.get('success')]
response_times = [r['time'] for r in successful]
print("\n" + "=" * 60)
print("STRESS TEST RESULTS")
print("=" * 60)
print(f"Total tasks: {NUM_TASKS}")
print(f"Successful: {len(successful)}")
print(f"Failed: {len(failed)}")
print(f"Total time: {total_time:.2f}s")
print(f"Tasks/second: {NUM_TASKS / total_time:.2f}")
if response_times:
print("\nResponse times:")
print(f" Min: {min(response_times) * 1000:.2f} ms")
print(f" Max: {max(response_times) * 1000:.2f} ms")
print(f" Mean: {mean(response_times) * 1000:.2f} ms")
print(f" Median: {median(response_times) * 1000:.2f} ms")
if failed:
print("\nErrors (first 5):")
for r in failed[:5]:
print(f" - {r.get('error', 'Unknown error')}")
print("=" * 60)
if __name__ == "__main__":
run_stress_test()
Test Results (Local Environment)
Note: The tests were performed on a single local server without disabling SQLite’s in‑memory cache. Even so, SQLite responded almost as fast with or without Valkey (Redis) caching.
SQLite 3 – Without Valkey (Redis) Cache
============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 4.57s
Tasks/second: 21.89
Response times:
Min: 10.92ms
Max: 1534.86ms
Mean: 664.45ms
Median: 627.76ms
============================================================
SQLite 3 – With Valkey (Redis) Cache
============================================================
STRESS TEST RESULTS
============================================================
Total tasks: 100
Successful: 8000
Failed: 0
Total time: 5.24s
Tasks/second: 19.08
Response times:
Min: 25.95ms
Max: 1896.03ms
Mean: 783.95ms
Median: 747.06ms
============================================================
Even accounting for the network overhead required to reach Valkey (Redis), SQLite’s performance remains impressive.
Observations
- SQLite is not typically used for large‑scale production workloads, so these local results don’t fully represent a real‑world scenario where a dedicated cache layer would be beneficial.
- Disabling SQLite’s internal cache still leaves the operating system’s file‑system cache active, which can mask the true impact of an external cache like Valkey.
- The most interesting takeaway is how OS‑level caching and SQLite’s own optimizations affect performance, rather than the proof‑of‑concept of using Valkey for caching.
Source Code
The complete source code for this proof‑of‑concept can be found at: