I Built an AI Creative Director: Automating FB Ad Gen with GPT-4o Vision & Structured Outputs

Published: 1 week ago (January 6, 2026 at 10:40 PM EST)

4 min read

Source: Dev.to

The Architecture: Eye, Brain, and Hand

This isn’t a simple “text‑to‑image” flow. It’s an “image‑to‑text‑to‑image” (multimodal) pipeline composed of three distinct phases:

Phase	Role	Description
Eye	Input Analysis	GPT‑4o Vision analyzes reference (inspiration) images and product images.
Brain	Structured Logic	A LangChain agent synthesizes these inputs into a strict JSON format.
Hand	Execution	A loop triggers the OpenAI Image Generation API for each concept.

Phase 1 – The “Vision” Analysis (Reverse Engineering)

Fetch images – Two parallel Google Drive nodes pull files:
- Inspiration Folder – High‑performing ads from competitors.
- Product Folder – Raw shots of the product we’re selling.
Pass images to GPT‑4o – Images are sent as Base64 to an OpenAI Chat Model node (gpt‑4o).

Prompt for inspiration analysis

Describe the visual style of this image... create a template of the style for inspirations.
Ensure you do not make this product specific, rather focusing on creating outlines for static ad styles.

Prompt for product analysis

Identify the core emotions behind it and the main product.
We will use this later to connect the product image with some ad styles.

Result: two text chunks – one describing the “Vibe” (style) and one describing the “Subject” (product/emotions).

Phase 2 – From Chaos to JSON (The Structured Output)

LLMs love to chat, but we need a clean array of prompts.

Use the Advanced AI node – Connect a LangChain Agent to the Structured Output Parser and supply a strict JSON schema.

Schema example

[
  {
    "Prompt": "Sun‑drenched poolside shot of the product on a marble ledge at golden hour, with soft shadows and warm tones. Aspect ratio 1:1."
  },
  {
    "Prompt": "Cool lavender‑tinted sunset beach backdrop behind the product, highlighting reflective metallic accents. Aspect ratio 4:5."
  }
]

The agent merges the Style Description (from the inspiration node) and the Product Description (from the product node) into this exact format, giving us a programmatic array of prompts—no regex or string parsing required.

Phase 3 – The Factory Line (Execution Loop)

Split the JSON array – A Split Out node separates each prompt.
Generate images via raw HTTP – Instead of a pre‑built DALL‑E node, an HTTP Request node calls the OpenAI image generation endpoint directly, giving fine‑grained control over parameters.
```
{
  "model": "dall-e-3",
  "prompt": "={{ $json.Prompt }}",
  "size": "1024x1024",
  "quality": "standard",
  "n": 1
}
```
Rate‑limit handling – A Wait node (not shown in the diagram) pauses a few seconds between requests to stay within OpenAI’s Tokens‑Per‑Minute limits.

The Result – Automated Creativity

The workflow now:

Pulls inspiration and product images from Google Drive.
Uses GPT‑4o Vision to extract style “vibes” and product “subjects.”
Converts those insights into a clean JSON array of 10‑plus image prompts.
Loops over the array, calling the DALL‑E 3 API to generate ad creatives.
Stores the resulting Base64 images back to Google Drive (or any destination you prefer).

Bottom line: By turning a traditionally manual, guess‑heavy process into a deterministic, multimodal pipeline, you can churn out high‑quality ad concepts at scale—without the dreaded “prompt block.”

(The original article continued with a deeper dive into post‑processing and A/B testing, but the core pipeline ends here.)

AI‑Generated Facebook Ad Images from Product Analysis

This n8n workflow takes a reference ad image and a product image, analyzes them with GPT‑4o, creates 10 fresh ad concepts, generates the corresponding images with DALL‑E 3, and saves the results to a designated “Output” folder in Google Drive.

How It Works

Read the Reference Image (Input).
Read the Product Image (Input).
Analyze both images with GPT‑4o.
Synthesize 10 new ad prompts via LangChain.
Generate 10 new images via DALL‑E 3.
Save the images to Google Drive.

I built this to test 50 different visual hooks for a campaign without having to brief a designer 50 times. The results are surprisingly coherent because they’re grounded in the visual analysis of ads that already work.

Get the Workflow

I’ve cleaned up the credentials and packaged the workflow into a JSON file you can import directly into your n8n instance.

👉 Download the AI Product Image Generator Workflow

Note: You will need your own OpenAI API key (with GPT‑4o access) and Google Drive credentials to run this.

Happy automating! 🤖

I Built an AI Creative Director: Automating FB Ad Gen with GPT-4o Vision & Structured Outputs

The Architecture: Eye, Brain, and Hand

Phase 1 – The “Vision” Analysis (Reverse Engineering)

Phase 2 – From Chaos to JSON (The Structured Output)

Phase 3 – The Factory Line (Execution Loop)

The Result – Automated Creativity

AI‑Generated Facebook Ad Images from Product Analysis

How It Works

Get the Workflow

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging

The Architecture: Eye, Brain, and Hand

Phase 1 – The “Vision” Analysis (Reverse Engineering)

Phase 2 – From Chaos to JSON (The Structured Output)

Phase 3 – The Factory Line (Execution Loop)

The Result – Automated Creativity

AI‑Generated Facebook Ad Images from Product Analysis

How It Works

Get the Workflow

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging

Phase 1 – The “Vision” Analysis (Reverse Engineering)

Phase 2 – From Chaos to JSON (The Structured Output)

Phase 3 – The Factory Line (Execution Loop)