[Paper] The Impact of Generative AI on Architectural Conceptual Design: Performance, Creative Self-Efficacy and Cognitive Load
Source: arXiv - 2601.10696v1
Overview
This paper investigates how generative AI (GenAI) tools—think GPT‑4‑style image generators or text‑to‑image models—affect the quality, confidence, and mental effort of architects during the early, conceptual phase of design. By running a controlled experiment with 36 university students, the authors show that GenAI can boost performance for novices, but it may also erode designers’ sense of creative competence.
Key Contributions
- Empirical evaluation of GenAI in a real‑world architectural design workflow (independent work → GenAI‑assisted vs. control).
- Performance analysis showing a significant benefit for novice designers while overall gains are modest.
- Psychological insights: use of GenAI reduces general creative self‑efficacy, even though task‑specific confidence remains stable.
- Cognitive load findings: no overall difference, but specific prompting strategies (iterative idea generation & visual feedback) correlate with lower perceived load.
- Design of a difference‑in‑differences framework that isolates the effect of GenAI from learning‑over‑time effects.
Methodology
- Participants – 36 students (architecture engineering majors and other disciplines) split into two groups.
- Task – A two‑phase architectural conceptual design assignment:
- Phase 1: design independently (baseline).
- Phase 2: either (a) use a GenAI tool (e.g., DALL‑E, Midjourney) to generate concepts, or (b) browse an online repository of existing projects (control).
- Measurements –
- Design performance: rated by expert architects on creativity, feasibility, and coherence.
- Creative self‑efficacy: self‑reported confidence in one’s creative ability before and after each phase.
- Cognitive load: NASA‑TLX style questionnaire after each phase.
- Analysis – Difference‑in‑differences (DiD) to compare changes across conditions, plus subgroup analysis based on prior design experience.
The approach is deliberately simple: participants work as they normally would, only the source of inspiration (AI vs. human‑curated archive) changes. This makes the findings easy to map onto everyday design tools.
Results & Findings
| Metric | Overall Effect | Notable Sub‑effects |
|---|---|---|
| Design performance | No statistically significant advantage for GenAI across the whole sample. | Novice designers (≤2 prior projects) showed a ~15 % lift in expert scores when using GenAI. |
| Creative self‑efficacy | Declined ~8 % on average for the GenAI group (p < 0.05). | Experienced designers maintained their self‑efficacy; the drop was driven by novices. |
| Cognitive load | No significant difference between GenAI and control conditions. | Prompt patterns that involved iterative refinement and visual feedback (e.g., “show me variations of this façade”) were linked to lower perceived load. |
| Prompt usage | Most participants used 3–5 prompts per design iteration. | Users who combined textual description with visual reference prompts achieved the biggest performance gains. |
In short, GenAI is a force multiplier for beginners when used with the right prompting strategy, but it can also make users feel less creatively capable.
Practical Implications
- Tool designers: embed prompt‑suggestion UI that encourages iterative refinement and visual feedback loops; this can reduce cognitive friction and improve outcomes.
- Design firms & studios: consider rolling out GenAI as a training aid for junior staff rather than a blanket productivity booster for senior architects.
- Developers of AI‑assisted CAD/BIM platforms: expose the model’s “confidence” or “novelty” scores so users can gauge when the AI is merely remixing existing ideas versus generating truly fresh concepts.
- Education: curricula should teach prompt engineering as a core skill, emphasizing how to ask for variations and visual cues rather than a single “final answer.”
- Project management: track prompt usage metrics (number of iterations, type of prompt) as a proxy for cognitive load and intervene if teams are over‑relying on a single prompt style.
Overall, the research suggests that context‑aware integration—pairing AI with human expertise and good prompting habits—delivers the most value.
Limitations & Future Work
- Sample size & diversity: only 36 students from a single university; results may not generalize to professional architects or cross‑cultural teams.
- Tool specificity: the study used a single, off‑the‑shelf GenAI model; performance could differ with domain‑fine‑tuned or multimodal systems.
- Short‑term assessment: cognitive load and self‑efficacy were measured immediately after the task; longer‑term effects (e.g., skill acquisition, dependence on AI) remain unknown.
- Future directions: larger field studies with practicing architects, longitudinal tracking of skill development, and experiments that vary the type of AI output (sketches vs. 3D models) to see how modality influences creative confidence.
Authors
- Han Jiang
- Yao Xiao
- Rachel Hurley
- Shichao Liu
Paper Information
- arXiv ID: 2601.10696v1
- Categories: cs.AI
- Published: January 15, 2026
- PDF: Download PDF