Skill Seekers v3.0.0: The Universal Data Preprocessor for AI Systems

Published: (February 8, 2026 at 05:30 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Overview

Skill Seekers v3.0.0 is a universal documentation pre‑processor that turns any docs source into structured knowledge ready for AI systems. One command can produce 16 output formats, generate CI/CD‑ready packages, and upload results to cloud storage.

Installation

pip install skill-seekers langchain-chroma langchain-openai

Quick start

Scrape a documentation set with a single command:

skill-seekers scrape --config configs/react.json

Output formats

FormatExample command
LangChain Documentsskill-seekers scrape --format langchain --config configs/react.json
LlamaIndex TextNodesskill-seekers scrape --format llama-index --config configs/vue.json
Pinecone‑ready Markdownskill-seekers scrape --target markdown --config configs/django.json
Claude skillskill-seekers scrape --target claude --config configs/react.json

Using the generated documents in Python

from skill_seekers.cli.adaptors import get_adaptor
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA

# Load the LangChain adaptor and documents
adaptor = get_adaptor('langchain')
documents = adaptor.load_documents("output/react/")

# Create a vector store
vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings())

# Build a RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    retriever=vectorstore.as_retriever()
)

# Query the knowledge base
result = qa_chain.invoke({"query": "What are React Hooks?"})
print(result["result"])

The same approach works with LlamaIndex, FAISS, Qdrant, Weaviate, etc.

Cloud storage upload

# AWS S3
skill-seekers cloud upload output/ --provider s3 --bucket my-bucket

# Google Cloud Storage
skill-seekers cloud upload output/ --provider gcs --bucket my-bucket

# Azure Blob Storage
skill-seekers cloud upload output/ --provider azure --container my-container

CI/CD integration

Add the official GitHub Action to keep your AI knowledge up‑to‑date:

# .github/workflows/update-docs.yml
- uses: skill-seekers/action@v1
  with:
    config: configs/react.json
    format: langchain

Feature summary

  • One‑command conversion to any of 16 formats (LangChain, LlamaIndex, Markdown, Claude, etc.)
  • 26 MCP tools for AI agents (scraping, packaging, cloud upload, vector‑DB export)
  • CI/CD ready with a dedicated GitHub Action
  • Cloud storage support: S3, GCS, Azure
  • Extensive test suite: 1,852 passing tests across 100 files, 58 k lines of Python code
  • Multi‑platform: Ubuntu, macOS, Python 3.10‑3.13

Full workflow

  1. Install the package (see above).
  2. Scrape the target documentation: skill-seekers scrape --format langchain --config configs/react.json.
  3. Load the generated documents with the appropriate adaptor.
  4. Create a vector store (e.g., Chroma) and build your RAG pipeline.
  5. Optional: upload the output to cloud storage or publish as a Claude skill.

All steps can be completed in about 15 minutes from raw docs to a working RAG pipeline.

Resources

  • Website:
  • GitHub repository:
  • Documentation:
  • PyPI:

Skill Seekers v3.0.0 was released on February 10, 2026.

0 views
Back to Blog

Related posts

Read more »

Happy women in STEM day!! <3

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...