How I Automated Python Documentation Using AST Parsing and Multi-Provider LLMs
Source: Dev.to
Introduction
We’ve all been there. You just spent three intense days crafting a highly optimized, beautifully architected new feature. The code is elegant. The tests are passing. The linter is perfectly silent. You push your branch, open a Pull Request, and then reality hits you like a truck:
“Oh right. I need to update the documentation.”
Let’s be honest: writing documentation is the chore that developers love to hate. In an ideal world, documentation evolves alongside the code. In reality, it stays stuck in 2023 while your application code races toward 2025.
For the longest time, the solution has been either drudgery (doing it manually) or using brittle, regex‑based parsers that break the moment you introduce a slightly complex Python decorator or a nested asynchronous function.
I decided I was done with both options. So, I spent the last few weeks building AutoDocGen (pypiautodocgen on PyPI).
Instead of searching for strings like a glorified grep command, AutoDocGen parses your Python code into an Abstract Syntax Tree (AST). It knows what’s a class, what’s a private method, and how your modules are intrinsically linked. It takes that blueprint and feeds it to the Large Language Model of your choice to generate human‑readable, perfectly formatted Markdown documentation.
Here is the story of how I built it, the technical hurdles I faced, and why I believe AST parsing combined with AI is the future of code documentation.
1. The Problem with Regex‑Based Documentation
Historically, many lightweight documentation tools have relied on regular expressions. They scan a file line‑by‑line looking for def or class, extract the following string, and try to grab the docstring block below it.
This approach is fundamentally flawed for modern Python development. Why? Because Python syntax is incredibly expressive.
@cache(ttl=3600)
@validate_schema(UserSchema)
async def fetch_user_data(
user_id: uuid.UUID,
include_history: bool = False
) -> Dict[str, Any]:
"""Fetches user data from the primary replica."""
passA regex parser has to somehow know that the decorators belong to the function, correctly identify it as asynchronous, handle the multi‑line signature, parse the type hints, and extract the docstring. Add in nested classes, closures, and complex return types, and your regex quickly devolves into an unmaintainable nightmare.
Regex doesn’t understand code; it only recognizes patterns in text. I needed a tool that understood the structure of Python itself.
2. Enter the Abstract Syntax Tree (AST)
Python includes a built‑in module called ast. It allows you to parse Python source code into a tree of nodes representing the syntactic structure of the program.
Instead of reading lines of text, AutoDocGen uses ast.parse() to read the “DNA” of your code.
When you feed the above snippet into an AST parser, it doesn’t see a string of text. It sees an AsyncFunctionDef node. It knows that this node has a decorator_list containing Call nodes. It maps out the arguments (complete with their type annotations) and gracefully extracts the exact docstring using ast.get_docstring().
By extracting this structured data, AutoDocGen builds a high‑fidelity “blueprint” of your codebase. We extract:
- Module‑level variables and logic
- Class definitions, their base classes (inheritance), and methods
- Standalone functions (sync and async)
- Exact signatures and type hints
We then serialize this blueprint into a structured format (JSON or YAML representation of the AST summary).
This is the secret sauce. We aren’t asking the AI to read your code from scratch and guess what it does. We are giving the AI a structural map and asking it to explain the map. This drastically reduces LLM hallucinations and dramatically improves the quality of the generated documentation.
3. Breaking Free from Vendor Lock‑in: Multi‑Provider Support
When I started building the AI generation step, I realized a major frustration with the current landscape of AI developer tools: almost all of them hard‑code OpenAI’s API.
While GPT‑4o is incredible, we are living in a golden age of open‑weight models and blistering‑fast inference APIs. I didn’t want users to be locked into OpenAI if they preferred Google’s tools, or if they wanted the incredible speed of Groq.
So, I built an abstraction layer within AutoDocGen to support multiple LLM providers:
| Provider | Why Use It |
|---|---|
| OpenAI | The standard fallback |
| Groq | Documentation generated in literally 2 seconds per file using Llama‑3 on LPUs |
| Google Gemini | Excellent context windows for deeply understanding complex module interdependencies |
| OpenRouter | Ultimate freedom – route requests to dozens of models (including free tiers like Stepfun) without changing core integration |
The configuration hierarchy is flexible. You can set everything via environment variables (GROQ_API_KEY), a local .env file, an autodocgen.yaml config, or directly in your pyproject.toml.
# autodocgen.yaml
version: 1
ai:
provider: groq
model: llama3-70b-8192
output:
dir: ./docs
format: markdown4. Templating the Output: Jinja2 for Premium Style
The final piece of the puzzle was the output format. Most automated documentation tools generate dull, uninspired text blocks. I wanted documentation that looked like it was handcrafted by a technical writer.
Instead of relying on the LLM to format the Markdown (which often leads to inconsistent headings and broken tables), AutoDocGen strictly separates generation from presentation.
The LLM returns structured data (a summary of the module, bullet points of functionality, etc.). AutoDocGen then injects this data into Jinja2 templates, giving you full control over the final look and feel of the docs.
# example_template.md.j2
# {{ module.name }}
{{ module.summary }}
{% for cls in module.classes %}
## Class `{{ cls.name }}`
{{ cls.docstring }}
{% for method in cls.methods %}
### Method `{{ method.name }}{{ method.signature }}`
{{ method.docstring }}
{% endfor %}
{% endfor %}By keeping generation and rendering separate, you can:
- Enforce a consistent style guide across all docs
- Add custom sections (e.g., usage examples, changelogs) without touching the AI code
- Swap out the template for different output formats (HTML, PDF, etc.)
5. Putting It All Together
Running AutoDocGen is as simple as:
autodocgen path/to/your/packageBehind the scenes it:
- Walks the source tree and parses each
.pyfile withast. - Builds a unified JSON/YAML blueprint of the entire package.
- Sends the blueprint (or relevant chunks) to the configured LLM.
- Receives structured documentation fragments.
- Renders the fragments through your Jinja2 templates.
- Writes the final Markdown files to the
output.diryou specified.
The result is a set of clean, human‑readable docs that stay in sync with the codebase, require virtually no manual upkeep, and can be regenerated on every CI run.
6. Why This Matters
- Reliability: AST parsing guarantees you’re looking at the real code structure, not a fragile text pattern.
- Speed: No more waiting minutes for a regex‑based tool to choke on a large repo.
- Flexibility: Multi‑provider LLM support means you can pick the model that fits your budget, latency, or privacy requirements.
- Quality: Structured prompts + Jinja2 rendering produce consistent, professional‑grade documentation.
In short, AutoDocGen shows that combining language‑aware parsing with modern LLMs is a practical, scalable path to keeping documentation alive.
Try It Out
pip install pypiautodocgen
autodocgen ./my_project --config autodocgen.yamlFeel free to open an issue, submit a PR, or share how you’ve customized the Jinja2 templates for your own style guide. Happy documenting!
Consistent, Premium Aesthetic
By using Jinja2 (module.md.j2 and index.md.j2), the CLI guarantees a consistent, premium aesthetic across your entire documentation site. It perfectly formats function signatures, builds an automatic Table of Contents, and cross‑links related modules.
If you don’t like the default template, you can easily fork the templates/ directory and build your own.
5. Security First: The “Zero‑Trust” QA Audit
Because I was releasing an AI tool that reads source code, I knew security and stability had to be paramount. I didn’t just write some unit tests and call it a day.
Before hitting v0.1.0, the project underwent what I call a Zero‑Trust Forensic QA Audit. I assumed the initial proof‑of‑concept code was entirely broken and built a test suite from scratch.
We utilized:
pytestfor comprehensive unit and integration testing.banditfor security scanning to ensure API keys are never leaked in logs and file I/O operations are secure.- Extensive mocking of all LLM providers so the CLI could be tested deeply in CI/CD without burning API credits.
- Edge‑case testing, including handling of exotic Unicode identifiers (yes,
def grüne_äpfel()parses perfectly).
The repository is now fully integrated with Codecov, maintaining a strict baseline for any future pull requests.
How to Get Started
If you are tired of your README files falling out of sync with your codebase, I highly encourage you to give AutoDocGen a spin. It’s live now on PyPI.
Install
pip install pypiautodocgenRun
autodocgen -o ./docs --provider groq # Or openai, gemini, openrouterThis will generate documentation for the current directory and output it to ./docs.
The Roadmap
Currently, AutoDocGen creates fantastic Markdown files perfectly suited for static site generators like MkDocs or direct consumption on GitHub.
Looking forward, I want to explore:
- Framework‑specific parsing – specialized templates for FastAPI endpoints or Django models.
- Diff‑based updating – only regenerating documentation for the functions that changed in a commit, rather than full‑file regeneration.
- Mermaid diagram generation – automatically creating architecture flowcharts based on AST imports.
Let’s Connect!
I built AutoDocGen to solve my own pain point, but I know the community has incredible ideas on how to push it further.
Check out the source code on GitHub (and drop a star if you find it useful!):
https://github.com/shifulegend/autodocgen
I would love to hear your feedback in the comments. Are you still writing documentation by hand? What has been your biggest frustration with existing auto‑generated documentation tools? Let me know!