[Paper] Prompt Less, Smile More: MTP with Semantic Engineering in Lieu of Prompt Engineering

Published: (November 24, 2025 at 01:58 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2511.19427v1

Overview

The paper “Prompt Less, Smile More: MTP with Semantic Engineering in Lieu of Prompt Engineering” tackles a growing pain point for developers building AI‑augmented software: the need to hand‑craft prompts for large language models (LLMs). By extending the Meaning Typed Programming (MTP) framework with a lightweight “Semantic Engineering” layer, the authors let developers embed natural‑language context directly in code, dramatically cutting the manual effort usually required for prompt engineering while preserving—or even improving—model performance.

Key Contributions

  • Semantic Context Annotations (SemTexts): A language‑level syntax that lets developers attach free‑form natural‑language notes to variables, functions, and data structures.
  • Integration with MTP: Extends the existing automatic prompt generation pipeline to consume SemTexts, turning enriched code semantics into high‑quality LLM prompts.
  • Jac language prototype: Implements SemTexts in the experimental Jac language, demonstrating feasibility without altering the underlying compiler or runtime.
  • Real‑world benchmark suite: Curated tasks that mimic typical AI‑integrated development scenarios (e.g., data cleaning pipelines, conversational agents, code‑assist tools).
  • Empirical validation: Shows that Semantic Engineering matches the accuracy of hand‑crafted prompt engineering across the benchmark while slashing developer time by ~70 %.

Methodology

  1. Semantic Enrichment: Developers annotate code constructs with @semtext comments (e.g., @semtext "this function extracts user intent from chat messages"). These annotations are parsed alongside the abstract syntax tree.
  2. Prompt Synthesis: The MTP engine combines static type information (e.g., function signatures, variable types) with the extracted SemTexts to generate a structured prompt that conveys both formal and informal intent to the LLM.
  3. Evaluation Pipeline:
    • Benchmarks: 12 tasks covering data transformation, UI generation, and autonomous decision‑making.
    • Baselines: (a) Pure MTP (no annotations), (b) Traditional manual prompt engineering, (c) Zero‑shot LLM usage.
    • Metrics: Task success rate, BLEU/ROUGE for generated text, and a developer effort survey (time spent writing prompts).

Results & Findings

ApproachAvg. Success RatePrompt Quality (BLEU)Avg. Dev. Time (min)
Zero‑shot LLM48 %0.312
Pure MTP62 %0.443
Manual Prompt Engineering78 %0.6812
MTP + Semantic Engineering77 %0.664
  • Performance parity: The enriched MTP pipeline reaches within 1 % of the manual prompt baseline on success rate and BLEU scores.
  • Efficiency gain: Developers spend roughly a third of the time they would need to write full prompts, thanks to concise natural‑language annotations.
  • Robustness: In tasks requiring domain‑specific reasoning (e.g., medical triage simulation), the semantic annotations helped the LLM avoid common misinterpretations that pure MTP missed.

Practical Implications

  • Faster prototyping: Teams can spin up AI‑driven features (chatbots, code assistants, data pipelines) without a dedicated prompt‑engineering sprint.
  • Maintainability: Since annotations live alongside code, future developers can see the intended LLM behavior directly in the source, reducing knowledge loss.
  • Tooling integration: IDE plugins could surface autocomplete for @semtext blocks, turning prompt design into a first‑class developer activity.
  • Cross‑language potential: While demonstrated in Jac, the concept maps cleanly to any language that supports comments or attributes, opening the door for gradual adoption in mainstream ecosystems (Python decorators, Java annotations, TypeScript JSDoc).

Limitations & Future Work

  • Language support: The current prototype is limited to the experimental Jac language; broader adoption will require language‑agnostic annotation standards.
  • Annotation quality: The approach assumes developers can articulate intent concisely; noisy or ambiguous SemTexts can degrade prompt fidelity.
  • Scalability of benchmarks: The benchmark suite, though realistic, covers a modest number of domains; larger, community‑driven datasets would strengthen external validity.
  • Future directions: The authors plan to (1) develop a language‑neutral annotation schema, (2) explore automated suggestion of SemTexts via LLMs themselves, and (3) evaluate the approach in large‑scale production codebases.

Authors

  • Jayanaka L. Dantanarayana
  • Savini Kashmira
  • Thakee Nathees
  • Zichen Zhang
  • Krisztian Flautner
  • Lingjia Tang
  • Jason Mars

Paper Information

  • arXiv ID: 2511.19427v1
  • Categories: cs.SE, cs.AI
  • Published: November 24, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »