How to Install Z-Image Turbo Locally

Published: 1 week ago (December 9, 2025 at 08:30 PM EST)

2 min read

Source: Dev.to

Overview

This guide explains how to set up Z-Image Turbo on your local machine. The model uses a 6B‑parameter architecture to generate high‑quality images with exceptional text rendering capabilities.

Online Alternative

If you don’t have a GPU or prefer not to install anything locally, you can use the online version:

Z‑Image Online – Free AI generator with perfect text rendering in 20+ languages, 4K photorealistic output, no GPU required.

System Requirements

Component	Recommended
GPU	16 GB VRAM (e.g., RTX 3090/4090 or comparable data‑center cards). Lower‑memory GPUs can work with offloading but will be slower.
Python	3.9 or newer
CUDA	Compatible with your GPU drivers (the example uses CUDA 12.4)

Create a Virtual Environment

# Create the environment
python -m venv zimage-env

# Activate the environment
# Linux / macOS
source zimage-env/bin/activate

# Windows
zimage-env\Scripts\activate

Install Dependencies

# Install PyTorch for CUDA 12.4 (adjust the index URL for other CUDA versions)
pip install torch --index-url https://download.pytorch.org/whl/cu124

# Install diffusers directly from source
pip install git+https://github.com/huggingface/diffusers

# Additional libraries
pip install transformers accelerate safetensors

Create a Python Script

Save the following as generate.py (or any name you prefer).

import torch
from diffusers import ZImagePipeline

# Load the model from Hugging Face
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)

# Move pipeline to GPU
pipe.to("cuda")

Generate an Image

Add this code to the script to produce an image:

prompt = (
    "City street at night with clear bilingual store signs, warm lighting, "
    "and detailed reflections on wet pavement."
)

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(123),
).images[0]

image.save("z_image_turbo_city.png")
print("Image saved successfully!")

Optional Optimizations

Flash Attention 2

# Switch attention backend to Flash Attention 2
pipe.transformer.set_attention_backend("flash")

Compile the Transformer (requires PyTorch 2.0+)

# Optional: compile for faster inference
# pipe.transformer.compile()

CPU Offloading (Low‑VRAM Systems)

If your GPU has less than 16 GB VRAM, enable CPU offloading to move parts of the model to system RAM:

pipe.enable_model_cpu_offload()

Note: Offloading allows the model to run on smaller GPUs, but generation will be slower.

How to Install Z-Image Turbo Locally

Overview

Online Alternative

System Requirements

Create a Virtual Environment

Install Dependencies

Create a Python Script

Generate an Image

Optional Optimizations

Flash Attention 2

Compile the Transformer (requires PyTorch 2.0+)

CPU Offloading (Low‑VRAM Systems)

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Overview

Online Alternative

System Requirements

Create a Virtual Environment

Install Dependencies

Create a Python Script

Generate an Image

Optional Optimizations

Flash Attention 2

Compile the Transformer (requires PyTorch 2.0+)

CPU Offloading (Low‑VRAM Systems)

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Compile the Transformer (requires PyTorch 2.0+)