From Manual Chaos to Workflow Engineering: Building a Local-First AI Automation Pipeline (and Rethinking Cloud LLMs Like Gemini)

Published: (February 28, 2026 at 02:17 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Cover image for From Manual Chaos to Workflow Engineering: Building a Local‑First AI Automation Pipeline (and Rethinking Cloud LLMs Like Gemini)

Micheal Angelo

This is a submission for the Built with Google Gemini: Writing Challenge.

What I Built with Google Gemini

Over the past few months I’ve maintained a daily LeetCode practice routine. While solving problems was straightforward, maintaining documentation wasn’t. Every solution required:

  • Creating a new file
  • Updating the README (sorted by difficulty)
  • Writing structured explanations
  • Formatting Markdown properly
  • Pushing everything to GitHub

It was repetitive, manual, and error‑prone.

To solve this, I built LeetCode AutoSync — a CLI automation tool that streamlines the entire workflow.

What the Tool Does

  • Adds new solutions locally with automatic file structuring
  • Updates and sorts README sections by difficulty
  • Generates structured solution write‑ups using an LLM
  • Logs token usage and performance metrics to Excel
  • Manages background inference using a producer‑consumer queue
  • Gracefully shuts down the model to free system resources

Where Gemini Fits In

While building this, I explored cloud‑based LLM solutions like Google Gemini. I was particularly interested in:

  • Structured content generation
  • Reliable latency
  • Deployment potential via Cloud Run
  • API‑based inference scalability

However, due to cost considerations and experimentation goals, I opted for a local‑first architecture using Ollama and Mistral.

This decision itself became part of my learning journey — understanding the trade‑offs between:

  • Local inference vs. cloud APIs
  • Cost vs. scalability
  • Latency vs. resource consumption
  • Infrastructure ownership vs. convenience

Although this version of the project uses a locally hosted model, the architectural design was influenced by how cloud LLM APIs (like Gemini) structure prompts and manage inference workflows.

Demo

Here’s a short walkthrough of the CLI automation tool in action:

🎥

The demo shows:

  • Adding a new solution interactively through the CLI
  • Queue‑based background LLM generation using a producer‑consumer model
  • Automatic README updates sorted by difficulty
  • Structured Markdown solution generation
  • Token usage and performance logging to Excel
  • Thread‑safe state management
  • Graceful model shutdown after queue completion

Architecture Highlights

  • A background worker thread handling inference
  • A synchronized task queue
  • Local LLM inference via Ollama
  • Structured file‑system updates
  • Git integration for repository management
  • Telemetry logging for token and performance metrics

Source code and implementation details are available here:

🔗

What I Learned

1. Concurrency in Python

Implemented a producer‑consumer queue pattern:

  • Main thread enqueues generation tasks
  • Background worker processes LLM calls
  • Thread‑safe tracking using locks
  • Graceful shutdown logic to avoid race conditions

2. Resource Lifecycle Management

Running local LLMs is RAM‑intensive. I added:

  • Queue monitoring
  • Automatic shutdown when the queue empties
  • Explicit model stop calls to free memory

3. Telemetry & Observability

Extended the system to log:

  • Prompt tokens
  • Response tokens
  • Total tokens
  • Load duration
  • Generation duration
  • Tokens per second

This gave insight into:

  • Cold vs. warm model loads
  • Throughput efficiency
  • Cost proxies for API alternatives

4. Trade‑offs: Local vs. Cloud

Using a local model made me appreciate what cloud services like Gemini abstract away:

  • No need to manage memory
  • No manual lifecycle control
  • Potentially better consistency
  • Scalable deployment options

At the same time, local inference gave me:

  • Cost control
  • Full ownership
  • Deep visibility into model behavior

That trade‑off analysis was one of the most valuable parts of this project.

Google Gemini Feedback

Although I did not integrate Gemini directly into this tool, I explored its ecosystem and documentation while evaluating architectural options.

What stands out

  • Clean API design
  • Strong structured‑output capability
  • Integration potential with Cloud Run
  • Reduced infrastructure overhead

Where I would want more

  • Transparent token‑usage insights at a granular level
  • Clear cost‑comparison tooling
  • More documentation around performance benchmarking

Future ideas

  • Swap local inference with the Gemini API
  • Measure performance differences
  • Deploy the tool as a serverless service using Cloud Run

Feel free to check out the repo, try the tool, or share your thoughts on local‑first vs. cloud‑first AI workflows!

Cloud‑Based Service

What’s Next

This project started as workflow automation, but it evolved into something deeper:

  • Understanding LLM systems design
  • Learning concurrency patterns
  • Measuring performance metrics
  • Thinking like an infrastructure engineer

My next goal is to:

  • Add multi‑model comparison support
  • Explore deployment options
  • Integrate structured logging
  • Experiment with cloud‑hosted inference services

This project transformed my LeetCode practice from manual chaos into an engineered workflow — and, more importantly, it transformed how I think about AI systems.

Thanks for reading 🚀

0 views
Back to Blog

Related posts

Read more »

Google Gemini Writing Challenge

What I Built - Where Gemini fit in - Used Gemini’s multimodal capabilities to let users upload screenshots of notes, diagrams, or code snippets. - Gemini gener...