I Built an AI Creative Director: Automating FB Ad Gen with GPT-4o Vision & Structured Outputs

Published: (January 6, 2026 at 10:40 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

The Architecture: Eye, Brain, and Hand

This isn’t a simple “text‑to‑image” flow. It’s an “image‑to‑text‑to‑image” (multimodal) pipeline composed of three distinct phases:

PhaseRoleDescription
EyeInput AnalysisGPT‑4o Vision analyzes reference (inspiration) images and product images.
BrainStructured LogicA LangChain agent synthesizes these inputs into a strict JSON format.
HandExecutionA loop triggers the OpenAI Image Generation API for each concept.

Phase 1 – The “Vision” Analysis (Reverse Engineering)

  1. Fetch images – Two parallel Google Drive nodes pull files:

    • Inspiration Folder – High‑performing ads from competitors.
    • Product Folder – Raw shots of the product we’re selling.
  2. Pass images to GPT‑4o – Images are sent as Base64 to an OpenAI Chat Model node (gpt‑4o).

    Full workflow overview in n8n canvas

  3. Prompt for inspiration analysis

    Describe the visual style of this image... create a template of the style for inspirations.
    Ensure you do not make this product specific, rather focusing on creating outlines for static ad styles.
  4. Prompt for product analysis

    Identify the core emotions behind it and the main product.
    We will use this later to connect the product image with some ad styles.

Result: two text chunks – one describing the “Vibe” (style) and one describing the “Subject” (product/emotions).

Phase 2 – From Chaos to JSON (The Structured Output)

LLMs love to chat, but we need a clean array of prompts.

  1. Use the Advanced AI node – Connect a LangChain Agent to the Structured Output Parser and supply a strict JSON schema.

    Structured output parser configuration

  2. Schema example

    [
      {
        "Prompt": "Sun‑drenched poolside shot of the product on a marble ledge at golden hour, with soft shadows and warm tones. Aspect ratio 1:1."
      },
      {
        "Prompt": "Cool lavender‑tinted sunset beach backdrop behind the product, highlighting reflective metallic accents. Aspect ratio 4:5."
      }
    ]

The agent merges the Style Description (from the inspiration node) and the Product Description (from the product node) into this exact format, giving us a programmatic array of prompts—no regex or string parsing required.

Phase 3 – The Factory Line (Execution Loop)

  1. Split the JSON array – A Split Out node separates each prompt.

  2. Generate images via raw HTTP – Instead of a pre‑built DALL‑E node, an HTTP Request node calls the OpenAI image generation endpoint directly, giving fine‑grained control over parameters.

    {
      "model": "dall-e-3",
      "prompt": "={{ $json.Prompt }}",
      "size": "1024x1024",
      "quality": "standard",
      "n": 1
    }

    HTTP request node configuration

  3. Rate‑limit handling – A Wait node (not shown in the diagram) pauses a few seconds between requests to stay within OpenAI’s Tokens‑Per‑Minute limits.

The Result – Automated Creativity

The workflow now:

  • Pulls inspiration and product images from Google Drive.
  • Uses GPT‑4o Vision to extract style “vibes” and product “subjects.”
  • Converts those insights into a clean JSON array of 10‑plus image prompts.
  • Loops over the array, calling the DALL‑E 3 API to generate ad creatives.
  • Stores the resulting Base64 images back to Google Drive (or any destination you prefer).

Bottom line: By turning a traditionally manual, guess‑heavy process into a deterministic, multimodal pipeline, you can churn out high‑quality ad concepts at scale—without the dreaded “prompt block.”

(The original article continued with a deeper dive into post‑processing and A/B testing, but the core pipeline ends here.)

AI‑Generated Facebook Ad Images from Product Analysis

This n8n workflow takes a reference ad image and a product image, analyzes them with GPT‑4o, creates 10 fresh ad concepts, generates the corresponding images with DALL‑E 3, and saves the results to a designated “Output” folder in Google Drive.

How It Works

  1. Read the Reference Image (Input).
  2. Read the Product Image (Input).
  3. Analyze both images with GPT‑4o.
  4. Synthesize 10 new ad prompts via LangChain.
  5. Generate 10 new images via DALL‑E 3.
  6. Save the images to Google Drive.

I built this to test 50 different visual hooks for a campaign without having to brief a designer 50 times. The results are surprisingly coherent because they’re grounded in the visual analysis of ads that already work.

Get the Workflow

I’ve cleaned up the credentials and packaged the workflow into a JSON file you can import directly into your n8n instance.

👉 Download the AI Product Image Generator Workflow

Note: You will need your own OpenAI API key (with GPT‑4o access) and Google Drive credentials to run this.

Happy automating! 🤖

Back to Blog

Related posts

Read more »

Rapg: TUI-based Secret Manager

We've all been there. You join a new project, and the first thing you hear is: > 'Check the pinned message in Slack for the .env file.' Or you have several .env...

Technology is an Enabler, not a Saviour

Why clarity of thinking matters more than the tools you use Technology is often treated as a magic switch—flip it on, and everything improves. New software, pl...