elsewhere, a text-to-3D studio

Published: (March 3, 2026 at 11:30 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Cover image for elsewhere, a text-to-3D studio

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

I built a high‑performance text‑to‑3D model studio that works straight from the browser! A user describes what they want in natural language—from “cute cat” to “floating pizza with laser eyes”—and Gemini generates an interactive 3D model (in THREE.js).

Asset generation is a two‑phase pipeline:

Planning phase

Gemini receives the user prompt plus PLANNING_SYSTEM_PROMPT_V4 as a system instruction (temperature 0.5, thinkingLevel: low, max 8192 output tokens). It returns a v3 schema JSON: an array of 3‑6 materials (color, roughness, metalness) and 4‑12 parts, each specifying:

  • geometry type (Box|Sphere|Cylinder|Cone|Torus|Lathe|Tube|Dome)
  • parent reference
  • priority (1‑3)
  • material index
  • geometry parameters
  • instance transforms (position/rotation/scale arrays)

The LLM never writes executable code; it only describes geometry in a constrained JSON vocabulary.

Compilation phase

SchemaCompiler.compile() runs five deterministic steps with no LLM involvement:

  1. Parse – normalize JSON, expand defaults, resolve material references.
  2. Validate – check required fields, parent references, topological sort.
  3. Budget – prune parts by priority (3 → 2 → 1.5) if mesh count exceeds 24 or material count exceeds 5.
  4. Auto‑snap – detect disconnected parts and snap to parent bounding box (threshold: 2.0 units).
  5. Emit – generate Three.js code: MeshStandardMaterial array, geometry constructors, and parent‑child hierarchy via group.add().

The system can also handle full scene generation from a single prompt or theme. After each round, screenshots are taken from multiple angles, fed back to Gemini, which tweaks coordinates and relations so assets fit together tidily.

Demo

  • Cloud Run link (password: buildwithelsewhere) – URL not provided in source
  • YouTube demo/trailerURL not provided in source
  • GitHub repository:
    https://github.com/bug39/elsewhere

3D world‑building studio powered by Gemini. Generate assets from text, build worlds, create animations.

elsewhere

AI‑Powered 3D World Studio
Describe what you want and AI builds it — 3D assets, entire scenes, living worlds you can explore in third person. Built for Google’s Gemini 3 Hackathon.

What You Can Do

  • Generate assets from text prompts — e.g., “a cozy cabin with smoke from the chimney”.
  • Generate entire scenes — e.g., “a medieval village marketplace”.
  • Arrange worlds on a 400 m terrain with biomes, heightmaps, and textures.
  • Script NPCs with behaviors and branching dialogue trees.
  • Play your world in third person — walk, run, jump, talk to NPCs.

Quick Start

npm install
npm run dev   # http://localhost:3000

Enter fullscreen mode
Exit fullscreen mode

If the hackathon results are not yet published, the demo may not be functional.
Requires a Gemini API key (free tier works).

Tech Stack

  • Preact
  • Three.js
  • Gemini 3 Flash
  • React Flow
  • IndexedDB

License

MIT

What I Learned

I faced many challenges with prompt engineering to get consistent outputs across a wide variety of prompts. It took around 30 iterations to refine the prompts. At one point I built a CLI agent that set up a mock studio with a base prompt plus 15+ tweaks, evaluated each output, and gradually improved the prompt set. I also had to become familiar with THREE.js to fine‑tune the generated 3D models, as textual instructions alone were insufficient.

Google Gemini Feedback

When Gemini 3 Flash Preview entered the pipeline, I was missing the final “push” to extract more detail from my THREE.js compiler. The release of Gemini-3.1-flash-preview brought a huge improvement in spatial reasoning, which was exactly what elsewhere needed (the Cloud Run link still runs gemini-3-flash-preview due to cost constraints). The experience with Gemini was very smooth and easy. Although the project started for a Gemini hackathon, early testing showed that Flash performed better, faster, and cheaper for generating 3D models.

0 views
Back to Blog

Related posts

Read more »