[Paper] Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation

Published: 3 days ago (May 7, 2026 at 01:39 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.06625v1

Overview

The paper introduces a lightweight “human‑in‑the‑loop” pipeline for annotating Korean as a second language (L2) using Universal Dependencies (UD). By letting two specially‑trained parsers vote on each sentence, the authors show that parser agreement can reliably stand in for manual checks, dramatically cutting the amount of human effort needed to build high‑quality L2‑Korean treebanks.

Key Contributions

Agreement‑based quality proxy: Demonstrates that when two domain‑adapted parsers concur, their output matches human judgments with high accuracy.
Simplified annotation workflow: Proposes a practical semi‑automatic pipeline that only requires human review of disagreement cases.
Error‑type analysis: Shows that most parser disagreements fall into predictable linguistic categories (e.g., grammatical‑relation ambiguities, clause‑boundary decisions).
Iterative refinement roadmap: Identifies which disagreement patterns can be resolved by further model training versus those that expose deeper representational limits.

Methodology

Data & Models – The authors start from an existing L2‑Korean corpus and fine‑tune two independent dependency parsers on a small, manually annotated seed set.
Agreement Check – For each new sentence, both parsers produce a full UD parse. If the parses are identical (tokenization, POS tags, and dependency arcs), the sentence is automatically accepted.
Human Validation – Sentences with mismatching parses are sent to linguists for verification. Their judgments are then compared against the parsers’ consensus decisions to assess how well agreement predicts correctness.
Error Categorization – Disagreement cases are manually grouped into linguistic phenomena (e.g., ambiguous case particles, ellipsis, clause‑boundary splits) to understand systematic weaknesses.

The workflow is deliberately simple: no complex confidence scoring, active learning loops, or crowdsourcing—just a binary “agree/disagree” gate that determines whether a human is needed.

Results & Findings

High correspondence: In > 90 % of cases where the two parsers agreed, human annotators also marked the parse as correct.
Disagreement concentration: Over 70 % of disagreements clustered around a handful of linguistic issues, such as distinguishing between subject vs. topic relations or handling omitted subjects common in learner Korean.
Iterative gains: Retraining the parsers on a modest set of previously disagreed sentences reduced the overall disagreement rate by roughly 15 % after one iteration.
Hard cases: Some disagreements persisted even after multiple refinements, pointing to ambiguities that may require changes to the underlying UD schema rather than just better models.

Practical Implications

Faster treebank creation: Development teams can bootstrap L2‑Korean UD resources with far fewer annotation hours, accelerating downstream NLP tasks like grammar checking or learner feedback systems.
Cost‑effective quality control: The agreement gate acts as an automatic sanity check, allowing project managers to allocate human reviewers only where they add the most value.
Transferable recipe: The same “dual‑parser agreement” strategy can be applied to other low‑resource or learner languages, offering a template for semi‑automatic corpus building in multilingual settings.
Better learner‑focused tools: High‑quality L2‑Korean parses enable more accurate error detection, automated writing assistance, and adaptive language‑learning platforms.

Limitations & Future Work

Domain dependence: The approach relies on having two reasonably strong parsers; building those initial models still requires a seed of manually annotated data.
Schema constraints: Some persistent disagreements stem from UD’s representation limits for learner language, suggesting that schema extensions or alternative annotation layers may be needed.
Scalability of error analysis: While the paper categorizes disagreement types, automating this categorization for large corpora remains an open challenge.
Future directions: The authors propose exploring confidence‑weighted voting, active learning to select the most informative disagreement cases, and extending the workflow to other morphologically rich L2 languages.

Authors

Hakyung Sung
Gyu-Ho Shin

Paper Information

arXiv ID: 2605.06625v1
Categories: cs.CL
Published: May 7, 2026
PDF: Download PDF

[Paper] Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

[Paper] Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

[Paper] The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

[Paper] CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation