[Paper] Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science

Published: (December 10, 2025 at 01:22 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.09895v1

Overview

The paper presents MatSci‑YAMZ, a prototype platform that blends artificial intelligence with a human‑in‑the‑loop (HILT) workflow—including crowdsourced contributions—to accelerate the creation of metadata vocabularies for materials‑science research. By demonstrating a successful pilot with six domain experts, the authors show how AI‑augmented crowdsourcing can make FAIR (Findable, Accessible, Interoperable, Reusable) data practices more scalable and less labor‑intensive.

Key Contributions

  • AI‑driven definition generation: A language model produces draft definitions for new metadata terms, which are then refined through human feedback.
  • Human‑in‑the‑loop workflow: Structured crowdsourcing cycles let participants edit, approve, or reject AI‑generated outputs, creating a transparent audit trail.
  • Proof‑of‑concept validation: Six NSF‑funded researchers generated 19 vetted term definitions over several weeks, confirming the feasibility of the approach.
  • Open‑science alignment: The platform’s design explicitly supports FAIR/FARR principles, promoting open, reproducible metadata creation.
  • Scalable protocol: The authors outline a repeatable research protocol that can be adapted to other scientific domains beyond materials science.

Methodology

  1. Term solicitation: Participants submit candidate metadata terms and example usage contexts via the MatSci‑YAMZ web interface.
  2. AI generation: A fine‑tuned large language model (LLM) drafts a concise definition for each term.
  3. Human review loop: Contributors evaluate the AI output, providing edits, acceptance, or rejection. Their feedback is fed back into the model to improve subsequent drafts.
  4. Iterative refinement: The cycle repeats until a consensus definition is reached, at which point the term is added to the shared vocabulary repository.
  5. Documentation: All interactions are logged, creating provenance metadata that satisfies FAIR audit requirements.

The workflow is deliberately lightweight: participants spend minutes per term, and the AI handles the bulk of the linguistic heavy lifting.

Results & Findings

  • 19 definitions completed: The pilot produced 19 high‑quality term definitions, each vetted by at least two experts.
  • Rapid convergence: Most terms required only 2–3 feedback iterations before consensus, cutting the typical weeks‑long manual drafting process down to days.
  • Positive user experience: Participants reported that the AI suggestions served as useful “first drafts,” reducing cognitive load and sparking discussion.
  • FAIR compliance: The resulting vocabularies were published with persistent identifiers, machine‑readable schemas, and clear provenance, meeting core FAIR criteria.
  • Scalability signals: The workflow’s modular design suggests it could handle larger crowds (hundreds of contributors) and more complex ontologies with modest additional engineering.

Practical Implications

  • Faster data onboarding: Labs can generate domain‑specific metadata vocabularies on‑the‑fly, enabling quicker integration of new datasets into shared repositories.
  • Reduced staffing costs: By offloading the initial drafting to an LLM, organizations need fewer dedicated curators, freeing them to focus on higher‑level semantic design.
  • Cross‑disciplinary interoperability: A standardized, AI‑augmented process helps align vocabularies across subfields (e.g., computational chemistry, nanofabrication), easing data exchange between teams.
  • Tool integration: The platform’s API can be hooked into existing ELN (Electronic Lab Notebook) systems, CI pipelines for data publishing, or community portals like Materials Project, providing “one‑click” metadata generation.
  • Community building: Crowdsourced refinement encourages broader stakeholder participation, fostering consensus and trust in the resulting standards.

Limitations & Future Work

  • Domain expertise bottleneck: The pilot relied on a small, highly specialized group; scaling to larger, more heterogeneous communities may surface coordination challenges.
  • LLM biases: The language model can inherit terminology biases from its training data, requiring vigilant human oversight to avoid propagating outdated or incorrect definitions.
  • Evaluation scope: The study measured feasibility and user satisfaction but did not quantify downstream impacts on data reuse metrics.
  • Future directions: The authors plan to (1) test the workflow with larger, public crowds; (2) integrate active learning to let the model prioritize ambiguous terms; and (3) benchmark the resulting vocabularies against existing ontologies to assess semantic coverage.

MatSci‑YAMZ illustrates how a smart blend of AI and human collaboration can turn the traditionally slow, manual task of metadata vocabulary creation into a rapid, community‑driven process—an advance that could accelerate FAIR data adoption across many scientific and engineering domains.

Authors

  • Jane Greenberg
  • Scott McClellan
  • Addy Ireland
  • Robert Sammarco
  • Colton Gerber
  • Christopher B. Rauch
  • Mat Kelly
  • John Kunze
  • Yuan An
  • Eric Toberer

Paper Information

  • arXiv ID: 2512.09895v1
  • Categories: cs.AI, cs.DL
  • Published: December 10, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »