Individual Developer's Portfolio Strategy: Running 13 Projects on a Single RTX 5090

Published: (March 8, 2026 at 05:17 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

13 Project List

Legal Tech

  • Contract Auto-Generation Tool (Clause suggestion with Streamlit + Gemini API)
  • Case Law Search System (Fast search of case law documents with SQLite FTS5)
  • Legal Compliance Chatbot (Article interpretation support with Gemini)

Chemical Simulation

  • Molecular Structure Prediction Model (FP8 Quantized ResNet)
  • Reaction Rate Calculation Engine (CUDA kernel optimized)

Shogi AI

  • Fuka40B (FP8 Quantized ResNet40x384, 80 layers)
  • Fuka2025Q2-20b (FP8 Policy Evaluation Model)
  • Floodgate Strategy Engine
  • ttzl‑ex (TensorRT Inference Optimization)
  • Shogi Data Analysis Pipeline

Others

  • Minecraft AI Assistant (vLLM Resident)
  • Stock Data Visualization Dashboard
  • Research Note Management System

Standardizing the Technology Stack

Search Infrastructure: SQLite FTS5

To standardize search functionality across all projects, SQLite FTS5 is adopted. For patent documents and case law data, fast and highly relevant searches are achieved through BM25 ranking.

Common UI: Streamlit

Streamlit is used for the frontend of all applications, standardizing the display of responses when integrating with the Gemini API.

import streamlit as st
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="特許文書から条項を抽出"
)
st.markdown(f"**提案条項**:\n{response.text}")

GPU Sharing Strategy

vLLM Resident Architecture

To maximize the utilization of the RTX 5090’s 32 GB VRAM, vLLM is launched as a resident process. The inference engine is switched according to the model size for each project.

TensorRT Switching Logic

In Shogi AI, models are optimized with TensorRT.

trtexec \
  --onnx=models/eval/model_fp8.onnx \
  --fp8 \
  --minShapes=input1:1x62x9x9,input2:1x57x9x9 \
  --optShapes=input1:256x62x9x9,input2:256x57x9x9 \
  --maxShapes=input1:256x62x9x9,input2:256x57x9x9 \
  --saveEngine=model_fp8_trt

GPU Usage Monitoring

while true; do
  usage=$(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits | tr -d ' ')
  if [ "$usage" -gt 80 ]; then
    systemctl --user stop vllm.service
  fi
  sleep 60
done

Cloudflare + Caddy Publishing Infrastructure

All web projects are published using Cloudflare Tunnel + Caddy. Caddy functions as a reverse proxy, handling HTTPS termination and routing.

Horizontal Security Deployment

A common security policy is applied across all projects.

  • API keys are managed via environment variables and are not hard‑coded.
  • Branch protection is configured to require pull requests.
  • Automated execution of periodic log‑auditing scripts.

Operational Tips

  • Standardized on CUDA 12.8 to resolve version conflicts between projects.
  • Managed per‑project library paths using environment variables.
  • Automatically stops services when GPU utilization exceeds a threshold.

Summary

To maximize the utilization of the RTX 5090’s 32 GB VRAM, three points were prioritized:

  1. Building a Common Infrastructure – Standardized search and UI with SQLite FTS5 and Streamlit.
  2. Dynamic Resource Management – Optimized based on model load with vLLM + TensorRT switching.
  3. Horizontal Security Deployment – Standardized authentication processes.

In the Shogi AI project, the combination of FP8 quantization and TensorRT achieved significant inference speed improvements compared to FP16. Balancing “freedom in technology selection” with “the importance of a common infrastructure” is key to personal development success.

0 views
Back to Blog

Related posts

Read more »