Private Vision AI: Run Reka Edge Entirely on Your Machine

Published: 1 month ago (March 19, 2026 at 09:17 AM EDT)

4 min read

Source: Dev.to

Source: Dev.to

Reka just released Reka Edge, a compact but powerful vision‑language model that runs entirely on your own machine. No API keys, no cloud, no data leaving your computer. I work at Reka, and putting together this tutorial was genuinely fun; I hope you enjoy running it as much as I did.

In three steps, you’ll go from zero to asking an AI what’s in any image or video.

What You’ll Need

A machine with enough RAM to run a 7 B parameter model (~16 GB recommended)
Git
uv, a fast Python package manager

Install it with:

curl -LsSf https://astral.sh/uv/install.sh | sh

This works on macOS, Linux, and Windows (WSL). If you’re on Windows without WSL, grab the Windows installer instead.

Step 1: Get the Model and Inference Code

Clone the Reka Edge repository from Hugging Face. This includes both the model weights and the inference code:

git clone https://huggingface.co/RekaAI/reka-edge-2603
cd reka-edge-2603

Grab a coffee while it downloads—the model weights are several GB.

Step 2: Fetch the Large Files

Hugging Face stores large files (model weights and images) using Git LFS. After cloning, these files exist on disk but contain only small pointer files, not the actual content.

Install Git LFS (command varies by platform):

# macOS
brew install git-lfs

# Linux / WSL (Ubuntu/Debian)
sudo apt install git-lfs

Initialize Git LFS:
```
git lfs install
```
Pull the large files (model weights and media samples):
```
git lfs pull
```

Step 3: Ask the Model About an Image or Video

Image

uv run example.py \
  --image ./media/hamburger.jpg \
  --prompt "What is in this image?"

Video

uv run example.py \
  --video ./media/many_penguins.mp4 \
  --prompt "What is in this?"

The model will load, process your input, and print a description—all locally, all private.

Try different prompts to unlock more

"Describe this scene in detail."
"What text is visible in this image?"
"Is there anything unusual or unexpected here?"

What’s Actually Happening? (optional reading)

You don’t need this to use the model, but if you’re curious about the internals of example.py, here’s a quick walkthrough.

1. Hardware selection

if torch.cuda.is_available():
    device = torch.device("cuda")
elif mps_ok:
    device = torch.device("mps")
else:
    device = torch.device("cpu")

The script automatically picks the best available device (CUDA, Metal, or CPU).

2. Model loading

processor = AutoProcessor.from_pretrained(args.model, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    args.model, ...).eval()

The 7 B‑parameter model is read from the cloned folder. Loading takes ~30 seconds depending on hardware.

3. Input packaging

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": args.image},
        {"type": "text",   "text": args.prompt},
    ],
}]

Your image (or video) and prompt are wrapped into a chat‑style message.

4. Tokenisation

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    return_dict=True,
)

The processor converts the image into numerical patches and the text into tokens.

5. Generation

output_ids = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False,
)

The model predicts the next token repeatedly until it reaches a stop condition.

6. Decoding

output_text = processor.tokenizer.decode(
    new_tokens,
    skip_special_tokens=True,
)
print(output_text)

Token IDs are turned back into human‑readable text and printed. No internet is involved at any point.

Here’s the Video

If you prefer watching and reading, here is the video version:

(embed or link to the video here)

That’s Pretty Cool, Right?

A single script. No API key. No cloud. You just ran a 7 B‑parameter vision‑language model entirely on your own machine, and it works whether you’re on a Mac, Linux, or Windows with WSL (which is what I used when I wrote this).

This works great as a one‑off script: drop in a file, ask a question, get an answer.

But what if you wanted to build something on top of it? A web app, a tool that watches a folder, or anything that needs to talk to the model repeatedly?
That’s exactly what the next post is about. I’ll show you how to wrap Edge as a local API, so instead of running a script, you have a service running on your machine that any app can plug into. Same model, same privacy, but now it’s a proper building block.

Private Vision AI: Run Reka Edge Entirely on Your Machine

What You’ll Need

Step 1: Get the Model and Inference Code

Step 2: Fetch the Large Files

Step 3: Ask the Model About an Image or Video

Image

Video

Try different prompts to unlock more

What’s Actually Happening? (optional reading)

1. Hardware selection

2. Model loading

3. Input packaging

4. Tokenisation

5. Generation

6. Decoding

Here’s the Video

That’s Pretty Cool, Right?

Related posts

Your Pipeline Is 21.5h Behind: Catching Startups Sentiment Leads with Pulsebit

The Claude Code CVE That Should Change How You Review AI-Generated Code

Are Banking Apps Safe? Why Yes, But Your Habits Matter More

45,000 Layoffs in March. Companies Blamed AI. The Numbers Say Otherwise.

What You’ll Need

Step 1: Get the Model and Inference Code

Step 2: Fetch the Large Files

Step 3: Ask the Model About an Image or Video

Image

Video

Try different prompts to unlock more

What’s Actually Happening? (optional reading)

1. Hardware selection

2. Model loading

3. Input packaging

4. Tokenisation

5. Generation

6. Decoding

Here’s the Video

That’s Pretty Cool, Right?

Related posts

Your Pipeline Is 21.5h Behind: Catching Startups Sentiment Leads with Pulsebit

The Claude Code CVE That Should Change How You Review AI-Generated Code

Are Banking Apps Safe? Why Yes, But Your Habits Matter More

45,000 Layoffs in March. Companies Blamed AI. The Numbers Say Otherwise.

Step 1: Get the Model and Inference Code

Step 2: Fetch the Large Files

Step 3: Ask the Model About an Image or Video