From Manual Chaos to Workflow Engineering: Building a Local-First AI Automation Pipeline (and Rethinking Cloud LLMs Like Gemini)
Source: Dev.to

This is a submission for the Built with Google Gemini: Writing Challenge.
What I Built with Google Gemini
Over the past few months I’ve maintained a daily LeetCode practice routine. While solving problems was straightforward, maintaining documentation wasn’t. Every solution required:
- Creating a new file
- Updating the README (sorted by difficulty)
- Writing structured explanations
- Formatting Markdown properly
- Pushing everything to GitHub
It was repetitive, manual, and error‑prone.
To solve this, I built LeetCode AutoSync — a CLI automation tool that streamlines the entire workflow.
What the Tool Does
- Adds new solutions locally with automatic file structuring
- Updates and sorts README sections by difficulty
- Generates structured solution write‑ups using an LLM
- Logs token usage and performance metrics to Excel
- Manages background inference using a producer‑consumer queue
- Gracefully shuts down the model to free system resources
Where Gemini Fits In
While building this, I explored cloud‑based LLM solutions like Google Gemini. I was particularly interested in:
- Structured content generation
- Reliable latency
- Deployment potential via Cloud Run
- API‑based inference scalability
However, due to cost considerations and experimentation goals, I opted for a local‑first architecture using Ollama and Mistral.
This decision itself became part of my learning journey — understanding the trade‑offs between:
- Local inference vs. cloud APIs
- Cost vs. scalability
- Latency vs. resource consumption
- Infrastructure ownership vs. convenience
Although this version of the project uses a locally hosted model, the architectural design was influenced by how cloud LLM APIs (like Gemini) structure prompts and manage inference workflows.
Demo
Here’s a short walkthrough of the CLI automation tool in action:
🎥
The demo shows:
- Adding a new solution interactively through the CLI
- Queue‑based background LLM generation using a producer‑consumer model
- Automatic README updates sorted by difficulty
- Structured Markdown solution generation
- Token usage and performance logging to Excel
- Thread‑safe state management
- Graceful model shutdown after queue completion
Architecture Highlights
- A background worker thread handling inference
- A synchronized task queue
- Local LLM inference via Ollama
- Structured file‑system updates
- Git integration for repository management
- Telemetry logging for token and performance metrics
Source code and implementation details are available here:
🔗
What I Learned
1. Concurrency in Python
Implemented a producer‑consumer queue pattern:
- Main thread enqueues generation tasks
- Background worker processes LLM calls
- Thread‑safe tracking using locks
- Graceful shutdown logic to avoid race conditions
2. Resource Lifecycle Management
Running local LLMs is RAM‑intensive. I added:
- Queue monitoring
- Automatic shutdown when the queue empties
- Explicit model stop calls to free memory
3. Telemetry & Observability
Extended the system to log:
- Prompt tokens
- Response tokens
- Total tokens
- Load duration
- Generation duration
- Tokens per second
This gave insight into:
- Cold vs. warm model loads
- Throughput efficiency
- Cost proxies for API alternatives
4. Trade‑offs: Local vs. Cloud
Using a local model made me appreciate what cloud services like Gemini abstract away:
- No need to manage memory
- No manual lifecycle control
- Potentially better consistency
- Scalable deployment options
At the same time, local inference gave me:
- Cost control
- Full ownership
- Deep visibility into model behavior
That trade‑off analysis was one of the most valuable parts of this project.
Google Gemini Feedback
Although I did not integrate Gemini directly into this tool, I explored its ecosystem and documentation while evaluating architectural options.
What stands out
- Clean API design
- Strong structured‑output capability
- Integration potential with Cloud Run
- Reduced infrastructure overhead
Where I would want more
- Transparent token‑usage insights at a granular level
- Clear cost‑comparison tooling
- More documentation around performance benchmarking
Future ideas
- Swap local inference with the Gemini API
- Measure performance differences
- Deploy the tool as a serverless service using Cloud Run
Feel free to check out the repo, try the tool, or share your thoughts on local‑first vs. cloud‑first AI workflows!
Cloud‑Based Service
What’s Next
This project started as workflow automation, but it evolved into something deeper:
- Understanding LLM systems design
- Learning concurrency patterns
- Measuring performance metrics
- Thinking like an infrastructure engineer
My next goal is to:
- Add multi‑model comparison support
- Explore deployment options
- Integrate structured logging
- Experiment with cloud‑hosted inference services
This project transformed my LeetCode practice from manual chaos into an engineered workflow — and, more importantly, it transformed how I think about AI systems.
Thanks for reading 🚀
