[Paper] Qualitative Coding Analysis through Open-Source Large Language Models: A User Study and Design Recommendations

Published: 3 days ago (February 20, 2026 at 12:04 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.18352v1

Overview

The paper presents ChatQDA, an on‑device framework that leverages open‑source large language models (LLMs) to assist researchers with qualitative coding while keeping raw data local. By sidestepping commercial APIs, the system aims to eliminate the privacy concerns that often block the use of powerful LLMs in sensitive, human‑centred research.

Key Contributions

Privacy‑first architecture: A fully local pipeline that runs open‑source LLMs on the user’s machine, avoiding any network traffic of raw interview or survey text.
Chat‑style coding interface: An interactive UI that lets analysts pose natural‑language prompts (e.g., “extract themes about user frustration”) and receive suggested codes in real time.
Mixed‑methods user study: 30 participants from social‑science and HCI backgrounds evaluated the tool, providing quantitative usability scores and qualitative feedback.
“Conditional trust” insight: Users trusted the system for surface‑level extraction but remained skeptical about deeper interpretive judgments and consistency across runs.
Design recommendations: Six actionable guidelines for building local‑first, LLM‑augmented analysis tools that balance verifiable privacy with methodological rigor.

Methodology

System Construction – The authors bundled a lightweight, open‑source transformer (e.g., LLaMA‑7B) with a custom prompt‑engineering layer that translates typical qualitative‑analysis tasks (open coding, memoing, theme generation) into model queries. All components run inside a Docker container on the analyst’s workstation.
User Study Design – A mixed‑methods approach combined:
- Quantitative: SUS (System Usability Scale) and NASA‑TLX workload questionnaires after a 45‑minute coding session.
- Qualitative: Semi‑structured interviews probing participants’ trust, perceived accuracy, and privacy concerns.
Data Collection – Participants coded a publicly available interview dataset (≈2 k words) using both ChatQDA and a baseline manual spreadsheet workflow.
Analysis – The authors performed statistical comparisons of SUS scores and coded the interview transcripts from the study itself, applying thematic analysis to surface emergent user attitudes.

Results & Findings

Usability: ChatQDA achieved an average SUS score of 82.4, indicating “excellent” usability, and participants reported a 30 % reduction in perceived workload versus the manual baseline.
Trust Profile: Users expressed conditional trust—they were comfortable letting the model suggest surface codes (e.g., keyword tags) but doubted its ability to capture nuanced, context‑dependent meanings. Consistency checks (re‑running the same prompt) sometimes yielded divergent code sets, reinforcing this skepticism.
Privacy Perception: Even though the system never transmitted data, 70 % of participants voiced lingering “epistemic uncertainty” about whether their data could be inadvertently exposed, highlighting a gap between technical guarantees and user confidence.
Efficiency Gains: On average, participants completed the coding task 22 minutes faster with ChatQDA, attributing the speedup to instant suggestion generation and reduced manual scrolling.

Practical Implications

For Developers of Research Tools – The study demonstrates that local‑first LLM integration is technically feasible and can dramatically improve workflow efficiency without sacrificing data sovereignty.
Enterprise & Compliance – Industries bound by GDPR, HIPAA, or internal data‑handling policies can adopt similar on‑device LLM pipelines to automate text‑analysis tasks (e.g., customer feedback mining) while staying within strict privacy envelopes.
Product Design – The “conditional trust” finding suggests that UI/UX should surface confidence scores, version histories, and easy ways to override or edit model‑generated codes, thereby giving analysts a safety net.
Open‑Source Ecosystem – By relying on community‑maintained models, organizations avoid vendor lock‑in and can audit the model weights, fostering greater transparency for auditors and ethics boards.

Limitations & Future Work

Model Scale – The study used a 7‑b parameter model; larger models could improve nuance but would strain typical workstation resources.
Dataset Scope – Only a single, publicly available interview corpus was tested; results may differ with longer, multilingual, or highly domain‑specific texts.
Trust Calibration – The authors note the need for systematic methods (e.g., calibrated confidence metrics, explainability overlays) to bridge the gap between technical privacy guarantees and user‑perceived security.
Future Directions – Planned extensions include (1) integrating differential privacy noise to further reassure users, (2) evaluating cross‑run reproducibility mechanisms, and (3) expanding the user study to professional qualitative analysts in health and legal sectors.

Authors

Tung T. Ngo
Dai Nguyen Van
Anh-Minh Nguyen
Phuong-Anh Do
Anh Nguyen-Quoc

Paper Information

arXiv ID: 2602.18352v1
Categories: cs.HC, cs.CR, cs.SE
Published: February 20, 2026
PDF: Download PDF

[Paper] Qualitative Coding Analysis through Open-Source Large Language Models: A User Study and Design Recommendations

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation

[Paper] ReqElicitGym: An Evaluation Environment for Interview Competence in Conversational Requirements Elicitation

[Paper] Many Tools, Few Exploitable Vulnerabilities: A Survey of 246 Static Code Analyzers for Security

[Paper] Role and Identity Work of Software Engineering Professionals in the Generative AI Era