elsewhere, a text-to-3D studio

Published: 2 months ago (March 3, 2026 at 11:30 PM EST)

4 min read

Source: Dev.to

Source: Dev.to

Cover image for elsewhere, a text-to-3D studio

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

I built a high‑performance text‑to‑3D model studio that works straight from the browser! A user describes what they want in natural language—from “cute cat” to “floating pizza with laser eyes”—and Gemini generates an interactive 3D model (in THREE.js).

Asset generation is a two‑phase pipeline:

Planning phase

Gemini receives the user prompt plus PLANNING_SYSTEM_PROMPT_V4 as a system instruction (temperature 0.5, thinkingLevel: low, max 8192 output tokens). It returns a v3 schema JSON: an array of 3‑6 materials (color, roughness, metalness) and 4‑12 parts, each specifying:

geometry type (Box|Sphere|Cylinder|Cone|Torus|Lathe|Tube|Dome)
parent reference
priority (1‑3)
material index
geometry parameters
instance transforms (position/rotation/scale arrays)

The LLM never writes executable code; it only describes geometry in a constrained JSON vocabulary.

Compilation phase

SchemaCompiler.compile() runs five deterministic steps with no LLM involvement:

Parse – normalize JSON, expand defaults, resolve material references.
Validate – check required fields, parent references, topological sort.
Budget – prune parts by priority (3 → 2 → 1.5) if mesh count exceeds 24 or material count exceeds 5.
Auto‑snap – detect disconnected parts and snap to parent bounding box (threshold: 2.0 units).
Emit – generate Three.js code: MeshStandardMaterial array, geometry constructors, and parent‑child hierarchy via group.add().

The system can also handle full scene generation from a single prompt or theme. After each round, screenshots are taken from multiple angles, fed back to Gemini, which tweaks coordinates and relations so assets fit together tidily.

Demo

Cloud Run link (password: buildwithelsewhere) – URL not provided in source
YouTube demo/trailer – URL not provided in source
GitHub repository:
```
https://github.com/bug39/elsewhere
```

3D world‑building studio powered by Gemini. Generate assets from text, build worlds, create animations.

elsewhere

AI‑Powered 3D World Studio
Describe what you want and AI builds it — 3D assets, entire scenes, living worlds you can explore in third person. Built for Google’s Gemini 3 Hackathon.

What You Can Do

Generate assets from text prompts — e.g., “a cozy cabin with smoke from the chimney”.
Generate entire scenes — e.g., “a medieval village marketplace”.
Arrange worlds on a 400 m terrain with biomes, heightmaps, and textures.
Script NPCs with behaviors and branching dialogue trees.
Play your world in third person — walk, run, jump, talk to NPCs.

Quick Start

npm install
npm run dev   # http://localhost:3000

Enter fullscreen mode
Exit fullscreen mode

If the hackathon results are not yet published, the demo may not be functional.
Requires a Gemini API key (free tier works).

Tech Stack

Preact
Three.js
Gemini 3 Flash
React Flow
IndexedDB

License

MIT

What I Learned

I faced many challenges with prompt engineering to get consistent outputs across a wide variety of prompts. It took around 30 iterations to refine the prompts. At one point I built a CLI agent that set up a mock studio with a base prompt plus 15+ tweaks, evaluated each output, and gradually improved the prompt set. I also had to become familiar with THREE.js to fine‑tune the generated 3D models, as textual instructions alone were insufficient.

Google Gemini Feedback

When Gemini 3 Flash Preview entered the pipeline, I was missing the final “push” to extract more detail from my THREE.js compiler. The release of Gemini-3.1-flash-preview brought a huge improvement in spatial reasoning, which was exactly what elsewhere needed (the Cloud Run link still runs gemini-3-flash-preview due to cost constraints). The experience with Gemini was very smooth and easy. Although the project started for a Gemini hackathon, early testing showed that Flash performed better, faster, and cheaper for generating 3D models.