elsewhere, a text-to-3D studio
Source: Dev.to

This is a submission for the Built with Google Gemini: Writing Challenge
What I Built with Google Gemini
I built a high‑performance text‑to‑3D model studio that works straight from the browser! A user describes what they want in natural language—from “cute cat” to “floating pizza with laser eyes”—and Gemini generates an interactive 3D model (in THREE.js).
Asset generation is a two‑phase pipeline:
Planning phase
Gemini receives the user prompt plus PLANNING_SYSTEM_PROMPT_V4 as a system instruction (temperature 0.5, thinkingLevel: low, max 8192 output tokens). It returns a v3 schema JSON: an array of 3‑6 materials (color, roughness, metalness) and 4‑12 parts, each specifying:
- geometry type (
Box|Sphere|Cylinder|Cone|Torus|Lathe|Tube|Dome) - parent reference
- priority (1‑3)
- material index
- geometry parameters
- instance transforms (position/rotation/scale arrays)
The LLM never writes executable code; it only describes geometry in a constrained JSON vocabulary.
Compilation phase
SchemaCompiler.compile() runs five deterministic steps with no LLM involvement:
- Parse – normalize JSON, expand defaults, resolve material references.
- Validate – check required fields, parent references, topological sort.
- Budget – prune parts by priority (3 → 2 → 1.5) if mesh count exceeds 24 or material count exceeds 5.
- Auto‑snap – detect disconnected parts and snap to parent bounding box (threshold: 2.0 units).
- Emit – generate Three.js code:
MeshStandardMaterialarray, geometry constructors, and parent‑child hierarchy viagroup.add().
The system can also handle full scene generation from a single prompt or theme. After each round, screenshots are taken from multiple angles, fed back to Gemini, which tweaks coordinates and relations so assets fit together tidily.
Demo
- Cloud Run link (password:
buildwithelsewhere) – URL not provided in source - YouTube demo/trailer – URL not provided in source
- GitHub repository:
https://github.com/bug39/elsewhere
3D world‑building studio powered by Gemini. Generate assets from text, build worlds, create animations.
elsewhere
AI‑Powered 3D World Studio
Describe what you want and AI builds it — 3D assets, entire scenes, living worlds you can explore in third person. Built for Google’s Gemini 3 Hackathon.
What You Can Do
- Generate assets from text prompts — e.g., “a cozy cabin with smoke from the chimney”.
- Generate entire scenes — e.g., “a medieval village marketplace”.
- Arrange worlds on a 400 m terrain with biomes, heightmaps, and textures.
- Script NPCs with behaviors and branching dialogue trees.
- Play your world in third person — walk, run, jump, talk to NPCs.
Quick Start
npm install
npm run dev # http://localhost:3000Enter fullscreen mode
Exit fullscreen mode
If the hackathon results are not yet published, the demo may not be functional.
Requires a Gemini API key (free tier works).
Tech Stack
- Preact
- Three.js
- Gemini 3 Flash
- React Flow
- IndexedDB
License
MIT
What I Learned
I faced many challenges with prompt engineering to get consistent outputs across a wide variety of prompts. It took around 30 iterations to refine the prompts. At one point I built a CLI agent that set up a mock studio with a base prompt plus 15+ tweaks, evaluated each output, and gradually improved the prompt set. I also had to become familiar with THREE.js to fine‑tune the generated 3D models, as textual instructions alone were insufficient.
Google Gemini Feedback
When Gemini 3 Flash Preview entered the pipeline, I was missing the final “push” to extract more detail from my THREE.js compiler. The release of Gemini-3.1-flash-preview brought a huge improvement in spatial reasoning, which was exactly what elsewhere needed (the Cloud Run link still runs gemini-3-flash-preview due to cost constraints). The experience with Gemini was very smooth and easy. Although the project started for a Gemini hackathon, early testing showed that Flash performed better, faster, and cheaper for generating 3D models.