How to Install Z-Image Turbo Locally

Published: (December 9, 2025 at 08:30 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Overview

This guide explains how to set up Z-Image Turbo on your local machine. The model uses a 6B‑parameter architecture to generate high‑quality images with exceptional text rendering capabilities.

Online Alternative

If you don’t have a GPU or prefer not to install anything locally, you can use the online version:

  • Z‑Image Online – Free AI generator with perfect text rendering in 20+ languages, 4K photorealistic output, no GPU required.

System Requirements

ComponentRecommended
GPU16 GB VRAM (e.g., RTX 3090/4090 or comparable data‑center cards). Lower‑memory GPUs can work with offloading but will be slower.
Python3.9 or newer
CUDACompatible with your GPU drivers (the example uses CUDA 12.4)

Create a Virtual Environment

# Create the environment
python -m venv zimage-env

# Activate the environment
# Linux / macOS
source zimage-env/bin/activate

# Windows
zimage-env\Scripts\activate

Install Dependencies

# Install PyTorch for CUDA 12.4 (adjust the index URL for other CUDA versions)
pip install torch --index-url https://download.pytorch.org/whl/cu124

# Install diffusers directly from source
pip install git+https://github.com/huggingface/diffusers

# Additional libraries
pip install transformers accelerate safetensors

Create a Python Script

Save the following as generate.py (or any name you prefer).

import torch
from diffusers import ZImagePipeline

# Load the model from Hugging Face
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)

# Move pipeline to GPU
pipe.to("cuda")

Generate an Image

Add this code to the script to produce an image:

prompt = (
    "City street at night with clear bilingual store signs, warm lighting, "
    "and detailed reflections on wet pavement."
)

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(123),
).images[0]

image.save("z_image_turbo_city.png")
print("Image saved successfully!")

Optional Optimizations

Flash Attention 2

# Switch attention backend to Flash Attention 2
pipe.transformer.set_attention_backend("flash")

Compile the Transformer (requires PyTorch 2.0+)

# Optional: compile for faster inference
# pipe.transformer.compile()

CPU Offloading (Low‑VRAM Systems)

If your GPU has less than 16 GB VRAM, enable CPU offloading to move parts of the model to system RAM:

pipe.enable_model_cpu_offload()

Note: Offloading allows the model to run on smaller GPUs, but generation will be slower.

Back to Blog

Related posts

Read more »