From Manual Chaos to Workflow Engineering: Building a Local-First AI Automation Pipeline (and Rethinking Cloud LLMs Like Gemini)

Published: 3 days ago (February 28, 2026 at 02:17 PM EST)

4 min read

Source: Dev.to

Cover image for From Manual Chaos to Workflow Engineering: Building a Local‑First AI Automation Pipeline (and Rethinking Cloud LLMs Like Gemini)

This is a submission for the Built with Google Gemini: Writing Challenge.

What I Built with Google Gemini

Over the past few months I’ve maintained a daily LeetCode practice routine. While solving problems was straightforward, maintaining documentation wasn’t. Every solution required:

Creating a new file
Updating the README (sorted by difficulty)
Writing structured explanations
Formatting Markdown properly
Pushing everything to GitHub

It was repetitive, manual, and error‑prone.

To solve this, I built LeetCode AutoSync — a CLI automation tool that streamlines the entire workflow.

What the Tool Does

Adds new solutions locally with automatic file structuring
Updates and sorts README sections by difficulty
Generates structured solution write‑ups using an LLM
Logs token usage and performance metrics to Excel
Manages background inference using a producer‑consumer queue
Gracefully shuts down the model to free system resources

Where Gemini Fits In

While building this, I explored cloud‑based LLM solutions like Google Gemini. I was particularly interested in:

Structured content generation
Reliable latency
Deployment potential via Cloud Run
API‑based inference scalability

However, due to cost considerations and experimentation goals, I opted for a local‑first architecture using Ollama and Mistral.

This decision itself became part of my learning journey — understanding the trade‑offs between:

Local inference vs. cloud APIs
Cost vs. scalability
Latency vs. resource consumption
Infrastructure ownership vs. convenience

Although this version of the project uses a locally hosted model, the architectural design was influenced by how cloud LLM APIs (like Gemini) structure prompts and manage inference workflows.

Demo

Here’s a short walkthrough of the CLI automation tool in action:

🎥

The demo shows:

Adding a new solution interactively through the CLI
Queue‑based background LLM generation using a producer‑consumer model
Automatic README updates sorted by difficulty
Structured Markdown solution generation
Token usage and performance logging to Excel
Thread‑safe state management
Graceful model shutdown after queue completion

Architecture Highlights

A background worker thread handling inference
A synchronized task queue
Local LLM inference via Ollama
Structured file‑system updates
Git integration for repository management
Telemetry logging for token and performance metrics

Source code and implementation details are available here:

🔗

What I Learned

1. Concurrency in Python

Implemented a producer‑consumer queue pattern:

Main thread enqueues generation tasks
Background worker processes LLM calls
Thread‑safe tracking using locks
Graceful shutdown logic to avoid race conditions

2. Resource Lifecycle Management

Running local LLMs is RAM‑intensive. I added:

Queue monitoring
Automatic shutdown when the queue empties
Explicit model stop calls to free memory

3. Telemetry & Observability

Extended the system to log:

Prompt tokens
Response tokens
Total tokens
Load duration
Generation duration
Tokens per second

This gave insight into:

Cold vs. warm model loads
Throughput efficiency
Cost proxies for API alternatives

4. Trade‑offs: Local vs. Cloud

Using a local model made me appreciate what cloud services like Gemini abstract away:

No need to manage memory
No manual lifecycle control
Potentially better consistency
Scalable deployment options

At the same time, local inference gave me:

Cost control
Full ownership
Deep visibility into model behavior

That trade‑off analysis was one of the most valuable parts of this project.

Google Gemini Feedback

Although I did not integrate Gemini directly into this tool, I explored its ecosystem and documentation while evaluating architectural options.

What stands out

Clean API design
Strong structured‑output capability
Integration potential with Cloud Run
Reduced infrastructure overhead

Where I would want more

Transparent token‑usage insights at a granular level
Clear cost‑comparison tooling
More documentation around performance benchmarking

Future ideas

Swap local inference with the Gemini API
Measure performance differences
Deploy the tool as a serverless service using Cloud Run

Feel free to check out the repo, try the tool, or share your thoughts on local‑first vs. cloud‑first AI workflows!

Cloud‑Based Service

What’s Next

This project started as workflow automation, but it evolved into something deeper:

Understanding LLM systems design
Learning concurrency patterns
Measuring performance metrics
Thinking like an infrastructure engineer

My next goal is to:

Add multi‑model comparison support
Explore deployment options
Integrate structured logging
Experiment with cloud‑hosted inference services

This project transformed my LeetCode practice from manual chaos into an engineered workflow — and, more importantly, it transformed how I think about AI systems.

Thanks for reading 🚀