[Paper] Heard or Halted? Gender, Interruptions, and Emotional Tone in U.S. Supreme Court Oral Arguments
Source: arXiv - 2512.05832v1
Overview
This paper investigates how interruptions during U.S. Supreme Court oral arguments affect what advocates say and how they sound, with a special focus on gender. By applying modern NLP techniques to a decade‑long transcript corpus, the author shows that while interruptions rarely change the meaning of an argument, they do carry a noticeably more negative emotional tone when aimed at female lawyers.
Key Contributions
- Large‑scale empirical study of 12,663 speech chunks from Supreme Court oral arguments (2010‑2019).
- Semantic impact analysis using GloVe‑based sentence embeddings to measure meaning drift before and after an interruption.
- Sentiment analysis (lexicon‑based) that reveals a gendered bias: interruptions toward women contain higher negative sentiment.
- Demonstration of computational discourse analysis as a tool for probing power dynamics in elite, high‑stakes settings.
- Open‑source pipeline built on the ConvoKit Supreme Court Corpus, reusable for other courtroom or debate datasets.
Methodology
- Data collection – The study leverages the ConvoKit Supreme Court Corpus, which contains fully time‑stamped transcripts of oral arguments. Each “speech chunk” is a continuous segment spoken by an advocate until a justice interjects.
- Identifying interruptions – A chunk is split into pre‑interruption and post‑interruption parts based on the timestamp of the justice’s interjection.
- Semantic similarity – Both parts are transformed into 300‑dimensional GloVe sentence embeddings (averaged word vectors). Cosine similarity between the two vectors quantifies how much the argument’s meaning shifts after the interruption.
- Sentiment measurement – A lexicon‑based approach (VADER/NRC) scores each chunk for positive, negative, and neutral affect. The focus is on the negative component.
- Statistical testing – Paired t‑tests compare pre‑ vs. post‑interruption similarity, and regression models assess whether gender (female vs. male advocate) predicts higher negative sentiment in interruptions, controlling for case type, justice, and argument length.
Results & Findings
- Semantic stability: Average cosine similarity between pre‑ and post‑interruption embeddings is 0.87 (on a 0–1 scale), indicating that the core argumentative content remains largely unchanged despite being cut off.
- Gendered sentiment: Interruptions directed at female advocates have a mean negative sentiment score 0.12 points higher than those aimed at male advocates (p < 0.01). This gap persists after accounting for case complexity and the individual justice’s speaking style.
- No significant effect of interruption length on semantic similarity, suggesting that even longer interjections don’t substantially rewrite the argument’s meaning.
Practical Implications
- Bias detection tools: The pipeline can be adapted into real‑time monitoring systems for courts, legislative hearings, or corporate meetings to flag potentially gender‑biased interruptions.
- Training for legal professionals: Law schools and clerkship programs could use these findings to raise awareness about subtle power dynamics and improve advocacy strategies.
- Design of conversational AI: Voice assistants or transcription services used in legal settings can incorporate bias‑aware post‑processing (e.g., highlighting negative interjections toward underrepresented speakers).
- Policy & reform: Empirical evidence of gendered negativity may inform judicial conduct guidelines or diversity initiatives within the judiciary.
Limitations & Future Work
- Lexicon‑based sentiment may miss nuanced sarcasm or context‑specific negativity; incorporating transformer‑based sentiment models could improve accuracy.
- The study focuses on U.S. Supreme Court oral arguments; results may not generalize to lower courts, other legal systems, or non‑legal debate venues.
- Speaker intent is not captured—some interruptions are procedural (e.g., asking for clarification) rather than adversarial. Future work could classify interruption types and examine their distinct impacts.
- Extending the analysis to intersectional identities (e.g., race + gender) and to longitudinal trends could reveal whether bias is decreasing over time.
Bottom line: By marrying large‑scale transcript data with straightforward NLP techniques, this research uncovers a subtle yet measurable gender bias in the emotional tone of Supreme Court interruptions—insights that are directly actionable for developers building bias‑aware tools, legal educators, and policymakers alike.
Authors
- Yifei Tong
Paper Information
- arXiv ID: 2512.05832v1
- Categories: cs.CL, cs.CY
- Published: December 5, 2025
- PDF: Download PDF