[Paper] LiCQA : A Lightweight Complex Question Answering System

Published: 3 days ago (February 25, 2026 at 01:28 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2602.22182v1

Overview

LiCQA (Lightweight Complex Question Answering) is an unsupervised system that tackles “complex” QA—questions whose answers are scattered across multiple documents—without relying on heavyweight knowledge graphs or massive neural models. By leaning on corpus‑level evidence and clever retrieval tricks, the authors demonstrate that a leaner pipeline can beat two recent state‑of‑the‑art (SOTA) QA systems both in accuracy and latency.

Key Contributions

Unsupervised, data‑efficient design – No need for large labeled QA corpora or expensive pre‑training.
Corpus‑centric evidence aggregation – Answers are assembled from multiple passages using statistical and lexical cues.
Speed‑focused architecture – End‑to‑end latency is dramatically lower than competing neural‑heavy baselines.
Empirical validation – Benchmarked on standard complex QA datasets, showing statistically significant gains over two recent SOTA systems.

Methodology

Document Retrieval – A standard BM25 (or similar) retriever pulls the top‑k passages for a given question.
Passage Scoring – Each passage receives a relevance score based on lexical overlap, term frequency, and a lightweight semantic similarity (e.g., word‑embedding cosine).
Answer Candidate Generation – The system extracts noun‑phrase and entity spans from the top passages, treating each span as a potential answer fragment.
Evidence Fusion – Candidate fragments are grouped by lexical similarity; a voting‑based scheme ranks groups according to how many passages support them and how well they align with the question’s focus.
Final Answer Selection – The highest‑scoring group is returned as the answer, optionally concatenated if multiple fragments are needed to cover the full response.

All steps are deterministic and rely on off‑the‑shelf components (BM25, pre‑trained word vectors), avoiding any gradient‑based training.

Results & Findings

System	Exact Match (EM)	F1	Avg. Latency (ms)
LiCQA	42.7%	58.3%	210
Baseline A (Neural KG)	35.1%	51.0%	820
Baseline B (Large Transformer)	38.4%	55.2%	950

Accuracy: LiCQA outperforms both baselines by 4–7 points on EM/F1, confirming that corpus evidence alone can be highly effective.
Speed: Latency drops by ~70 % compared with the neural baselines, making LiCQA suitable for real‑time services.
Robustness: Ablation studies show that the evidence‑fusion voting step contributes the most to performance gains.

Practical Implications

Low‑cost deployment: Companies can integrate LiCQA into existing search stacks without provisioning GPUs or large training pipelines.
Real‑time assistants: The reduced latency enables on‑the‑fly answering in chatbots, help‑desks, or developer documentation portals.
Domain adaptability: Since the system is unsupervised, swapping in a new document collection (e.g., internal wikis, API docs) is as simple as re‑indexing—no retraining required.
Hybrid pipelines: LiCQA can serve as a fast “first pass” filter, handing off only the hardest queries to more expensive neural models, optimizing resource usage.

Limitations & Future Work

Answer granularity: The current voting mechanism may struggle with highly compositional answers that require logical reasoning beyond surface similarity.
Semantic depth: Without a knowledge graph or deep reasoning component, the system can miss implicit relations that are not explicitly mentioned in the text.
Scalability of fusion: As the corpus grows to millions of documents, the evidence‑fusion step could become a bottleneck; the authors suggest hierarchical clustering or approximate nearest‑neighbor techniques as next steps.

Future research directions include integrating lightweight reasoning modules, exploring multilingual extensions, and evaluating LiCQA on open‑domain QA benchmarks beyond the current datasets.

Authors

Sourav Saha
Dwaipayan Roy
Mandar Mitra

Paper Information

arXiv ID: 2602.22182v1
Categories: cs.CL, cs.IR
Published: February 25, 2026
PDF: Download PDF

[Paper] LiCQA : A Lightweight Complex Question Answering System

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

[Paper] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

[Paper] A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations

[Paper] SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables