DoodleMates: Building a Multimodal Creature Generator

Published: (December 3, 2025 at 02:40 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Cover image for DoodleMates: Building a Multimodal Creature Generator

This post is my submission for DEV Education Track: Build Apps with Google AI Studio.

I set out to build DoodleMates, an app that turns any photo and personality traits into a unique 3D doodle creature.
The core functionality relies on a single multimodal API call. The key prompt I crafted was designed to leverage both image and text inputs:

“Analyze the image’s aesthetic and colors, then generate a detailed 3D doodle‑style creature sticker that reflects ‘[User’s Personality Notes]’ and matches the image’s style.”

I utilized the Studio’s multimodal capabilities and the Prompt Engineering interface to rapidly iterate on the visual style and consistency.

Demo

Input

The user shares a photo and simple text notes.

Input example

Output

The generated, custom DoodleMate.

Output example

My Experience

What I Learned 💡

  • True Multimodal Simplicity – The model elegantly handles fundamentally different inputs (an image and a block of text) and produces a unified, creative output (a new image) without needing separate APIs for analysis and generation.
  • Prompt as Code – Tweaking words like “3D sticker,” “whimsical,” or “charming” acted like visual parameters, allowing me to refine the aesthetic without writing traditional code.

What Was Surprising 🤯

  • Speed of Prototyping – I went from a simple concept to a functional core engine for a highly custom, image‑to‑image application in less than an hour. Testing the API directly in the Studio environment made iterating on the perfect prompt incredibly fast, a game‑changer for solo developers.

If you’re looking for a quick, creative project, using Google AI Studio for multimodal tasks is the perfect way to turn pixels into personality!

Back to Blog

Related posts

Read more »