Running SAM 3 on AMD Ryzen AI Max+ 395: A Complete Guide to Fixing the rocBLAS Error

Published: (December 18, 2025 at 02:43 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

“I’ve been battling with AI (Claude) for 14 hours a day. Couldn’t be happier.”
— Akio Shiki (@ar_akio) – October 20, 2025

Hi, I’m Akio, an engineer at an AI‑development startup. In my previous article I introduced SAM 3. This time I’ll share the pitfalls I encountered when running SAM 3 on AMD hardware.

We constantly test the latest AI models and hardware. Right now I have in my hands what can only be described as a monument to AMD engineering: the Ryzen AI Max+ 395.

AMD Official

The specs on this machine are, frankly, insane. With high‑bandwidth memory and a powerful iGPU, it truly shines when running massive LLMs like OpenAI’s gpt‑oss‑120b locally.

But that’s not what I’m doing today.

Today, it’s Meta’s latest image‑segmentation model: SAM 3 (Segment Anything Model 3).

Meta Official

“Wait, SAM 3? Isn’t that lightweight? If you want inference speed, wouldn’t an NVIDIA dGPU be a better fit?”

You’re absolutely right—no argument there. Running SAM 3 on a Ryzen AI Max+ 395 is, in a sense, using a sledgehammer to crack a nut.

But I don’t care. The reason is simple:

“I just wanted to run the hottest new model on AMD’s latest hardware.”

This is a passion project—efficiency be damned. The errors I encountered and the solutions I found should be universally valuable for AMD users. Consider this a definitive guide to conquering the rocBLAS error that virtually every Ryzen AI user will face.

My Setup

ComponentDetails
OSWindows 11
AI StackAMD ROCm (HIP SDK)
FrameworkPyTorch (ROCm build)
ModelSAM 3
HardwareRyzen AI Max+ 395 (gfx1151)

The environment installed without issues. I then ran the inference script—only to be flooded with merciless error logs.

rocBLAS error: TensileLibrary.dat not found

What the error means

“I can’t find the computation library for your GPU (gfx1151), so I can’t do any calculations.”

Because the Ryzen AI Max+ 395 uses the newest architecture, the official libraries haven’t fully caught up with the path configurations—a common story with newly released hardware.

The First (Failed) Work‑around

In AMD circles, the usual fix is to spoof the environment variable so the system thinks the GPU is a compatible older model (gfx1100).

$env:HSA_OVERRIDE_GFX_VERSION = "11.0.0"

I expected this to solve everything… but the error persisted, still looking for files under:

site-packages\_rocm_sdk_libraries_gfx1151\bin

Digging Deeper

After exhausting Google, GitHub Issues, and Reddit threads (with virtually no hits), I decided to inspect the local library folders.

What I found

In the PyTorch ROCm installation folder there was an unexpected directory:

.../site-packages/_rocm_sdk_libraries_custom/

Inside it:

.../_rocm_sdk_libraries_custom/bin/rocblas/library/

I discovered the missing files:

gfx1151‑related files
TensileLibrary_lazy_gfx1151.dat

Key insight:
The RDNA 3.5 library files were already present, but PyTorch was looking for a folder named _rocm_sdk_libraries_gfx1151. The actual files lived under _rocm_sdk_libraries_custom. Hence the “folder‑structure mismatch”.

The Fix – Step‑by‑Step

Goal: Replicate the folder hierarchy PyTorch expects and place the existing files there.

  1. Locate the source files

    \site-packages\_rocm_sdk_libraries_custom\bin\rocblas\library
  2. Copy all files (*.dat, *.hsaco, etc.) from that directory.

  3. Create the expected hierarchy

    \site-packages\_rocm_sdk_libraries_gfx1151\bin
    • If the \_rocm_sdk_libraries_gfx1151 folder does not exist, create it.
    • Inside it, create a bin sub‑folder.
  4. Paste the copied files into the newly created bin folder.

  5. Rename the Tensile library (optional but recommended)

    • Duplicate TensileLibrary_lazy_gfx1151.dat.
    • Rename the copy to TensileLibrary.dat.

    This mirrors the naming convention some scripts expect.

Visual Summary

\site-packages

├─ _rocm_sdk_libraries_custom
│   └─ bin
│       └─ rocblas
│           └─ library
│               ├─ TensileLibrary_lazy_gfx1151.dat
│               └─ … (other .dat/.hsaco files)

└─ _rocm_sdk_libraries_gfx1151
    └─ bin
        ├─ TensileLibrary_lazy_gfx1151.dat
        ├─ TensileLibrary.dat   ← copy of the above
        └─ … (all other files pasted here)

Result

Running the script again produced a clean log:

[INFO] Device: cuda (PyTorch fallback label)
[INFO] Inference completed successfully.
VRAM usage: 7 GB
Single‑image inference time: ~8 s

Note: The “cuda” label is just PyTorch’s default string; it appears even for non‑CUDA devices.

The integrated GPU was humming along, delivering ~8 seconds per image—perfectly acceptable for a lightweight model like SAM 3. Real‑time video is still out of reach (I’d love a high‑end NVIDIA GPU), but the experiment proves that the latest image models can run on AMD hardware.

Takeaways

  1. Folder‑structure mismatches can masquerade as missing‑library errors.
  2. The AMD ROCm ecosystem may ship the required binaries under a “custom” path before official support lands.
  3. Creating the expected directory hierarchy and copying the files resolves the rocBLAS error: TensileLibrary.dat not found on Ryzen AI 300 series GPUs (gfx1151).

Feel free to adapt this fix for any future AMD hardware releases where the official libraries lag behind.

Happy hacking, and may your GPUs always find the libraries they need!


Powerful hardware like the Ryzen AI Max+ 395 is in a transitional period where the software ecosystem (especially Windows ROCm) hasn’t caught up with hardware evolution. However, as this case shows, there are many situations where “the files exist, but the paths aren’t configured correctly.”

Don’t give up—dig through those directories and you might find the solution.

To all AMD users struggling with this same error: give this “folder transplant surgery” a try. Here’s to comfortable (and slightly over‑powered) local AI adventures!

If you have feedback on this article or requests for “truly heavy models” you’d like me to test on the Ryzen AI Max+ 395, drop a comment below!

Next time, I’ll be posting about combining SAM 3 with IoT cameras (ESP32‑based), so stay tuned!

Back to Blog

Related posts

Read more »

Jenkins in DevSecOps Periodic Table

Jenkins in the DevSecOps Periodic Table In the DevSecOps Periodic Table, Jenkins holds a vital position as one of the most widely adopted automation tools. It p...