Skill Seekers v3.0.0: The Universal Data Preprocessor for AI Systems

Published: 2 days ago (February 8, 2026 at 05:30 PM EST)

3 min read

Source: Dev.to

Overview

Skill Seekers v3.0.0 is a universal documentation pre‑processor that turns any docs source into structured knowledge ready for AI systems. One command can produce 16 output formats, generate CI/CD‑ready packages, and upload results to cloud storage.

Installation

pip install skill-seekers langchain-chroma langchain-openai

Quick start

Scrape a documentation set with a single command:

skill-seekers scrape --config configs/react.json

Output formats

Format	Example command
LangChain Documents	`skill-seekers scrape --format langchain --config configs/react.json`
LlamaIndex TextNodes	`skill-seekers scrape --format llama-index --config configs/vue.json`
Pinecone‑ready Markdown	`skill-seekers scrape --target markdown --config configs/django.json`
Claude skill	`skill-seekers scrape --target claude --config configs/react.json`

Using the generated documents in Python

from skill_seekers.cli.adaptors import get_adaptor
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA

# Load the LangChain adaptor and documents
adaptor = get_adaptor('langchain')
documents = adaptor.load_documents("output/react/")

# Create a vector store
vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings())

# Build a RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    retriever=vectorstore.as_retriever()
)

# Query the knowledge base
result = qa_chain.invoke({"query": "What are React Hooks?"})
print(result["result"])

The same approach works with LlamaIndex, FAISS, Qdrant, Weaviate, etc.

Cloud storage upload

# AWS S3
skill-seekers cloud upload output/ --provider s3 --bucket my-bucket

# Google Cloud Storage
skill-seekers cloud upload output/ --provider gcs --bucket my-bucket

# Azure Blob Storage
skill-seekers cloud upload output/ --provider azure --container my-container

CI/CD integration

Add the official GitHub Action to keep your AI knowledge up‑to‑date:

# .github/workflows/update-docs.yml
- uses: skill-seekers/action@v1
  with:
    config: configs/react.json
    format: langchain

Feature summary

One‑command conversion to any of 16 formats (LangChain, LlamaIndex, Markdown, Claude, etc.)
26 MCP tools for AI agents (scraping, packaging, cloud upload, vector‑DB export)
CI/CD ready with a dedicated GitHub Action
Cloud storage support: S3, GCS, Azure
Extensive test suite: 1,852 passing tests across 100 files, 58 k lines of Python code
Multi‑platform: Ubuntu, macOS, Python 3.10‑3.13

Full workflow

Install the package (see above).
Scrape the target documentation: skill-seekers scrape --format langchain --config configs/react.json.
Load the generated documents with the appropriate adaptor.
Create a vector store (e.g., Chroma) and build your RAG pipeline.
Optional: upload the output to cloud storage or publish as a Claude skill.

All steps can be completed in about 15 minutes from raw docs to a working RAG pipeline.

Resources

Website:
GitHub repository:
Documentation:
PyPI:

Skill Seekers v3.0.0 was released on February 10, 2026.

Skill Seekers v3.0.0: The Universal Data Preprocessor for AI Systems

Overview

Installation

Quick start

Output formats

Using the generated documents in Python

Cloud storage upload

CI/CD integration

Feature summary

Full workflow

Resources

Related posts

Happy women in STEM day!! <3

Your Coding Agent Doesn't Need a Bigger Context Window. It Needs Coworkers.

Data-driven decision making using Power BI.

Introducing QwikChek: Security Scanning Built for Developers