Skill Seekers v3.0.0: The Universal Data Preprocessor for AI Systems
Source: Dev.to
Overview
Skill Seekers v3.0.0 is a universal documentation pre‑processor that turns any docs source into structured knowledge ready for AI systems. One command can produce 16 output formats, generate CI/CD‑ready packages, and upload results to cloud storage.
Installation
pip install skill-seekers langchain-chroma langchain-openai
Quick start
Scrape a documentation set with a single command:
skill-seekers scrape --config configs/react.json
Output formats
| Format | Example command |
|---|---|
| LangChain Documents | skill-seekers scrape --format langchain --config configs/react.json |
| LlamaIndex TextNodes | skill-seekers scrape --format llama-index --config configs/vue.json |
| Pinecone‑ready Markdown | skill-seekers scrape --target markdown --config configs/django.json |
| Claude skill | skill-seekers scrape --target claude --config configs/react.json |
Using the generated documents in Python
from skill_seekers.cli.adaptors import get_adaptor
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
# Load the LangChain adaptor and documents
adaptor = get_adaptor('langchain')
documents = adaptor.load_documents("output/react/")
# Create a vector store
vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings())
# Build a RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
retriever=vectorstore.as_retriever()
)
# Query the knowledge base
result = qa_chain.invoke({"query": "What are React Hooks?"})
print(result["result"])
The same approach works with LlamaIndex, FAISS, Qdrant, Weaviate, etc.
Cloud storage upload
# AWS S3
skill-seekers cloud upload output/ --provider s3 --bucket my-bucket
# Google Cloud Storage
skill-seekers cloud upload output/ --provider gcs --bucket my-bucket
# Azure Blob Storage
skill-seekers cloud upload output/ --provider azure --container my-container
CI/CD integration
Add the official GitHub Action to keep your AI knowledge up‑to‑date:
# .github/workflows/update-docs.yml
- uses: skill-seekers/action@v1
with:
config: configs/react.json
format: langchain
Feature summary
- One‑command conversion to any of 16 formats (LangChain, LlamaIndex, Markdown, Claude, etc.)
- 26 MCP tools for AI agents (scraping, packaging, cloud upload, vector‑DB export)
- CI/CD ready with a dedicated GitHub Action
- Cloud storage support: S3, GCS, Azure
- Extensive test suite: 1,852 passing tests across 100 files, 58 k lines of Python code
- Multi‑platform: Ubuntu, macOS, Python 3.10‑3.13
Full workflow
- Install the package (see above).
- Scrape the target documentation:
skill-seekers scrape --format langchain --config configs/react.json. - Load the generated documents with the appropriate adaptor.
- Create a vector store (e.g., Chroma) and build your RAG pipeline.
- Optional: upload the output to cloud storage or publish as a Claude skill.
All steps can be completed in about 15 minutes from raw docs to a working RAG pipeline.
Resources
- Website:
- GitHub repository:
- Documentation:
- PyPI:
Skill Seekers v3.0.0 was released on February 10, 2026.