[Paper] LiCQA : A Lightweight Complex Question Answering System

Published: (February 25, 2026 at 01:28 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2602.22182v1

Overview

LiCQA (Lightweight Complex Question Answering) is an unsupervised system that tackles “complex” QA—questions whose answers are scattered across multiple documents—without relying on heavyweight knowledge graphs or massive neural models. By leaning on corpus‑level evidence and clever retrieval tricks, the authors demonstrate that a leaner pipeline can beat two recent state‑of‑the‑art (SOTA) QA systems both in accuracy and latency.

Key Contributions

  • Unsupervised, data‑efficient design – No need for large labeled QA corpora or expensive pre‑training.
  • Corpus‑centric evidence aggregation – Answers are assembled from multiple passages using statistical and lexical cues.
  • Speed‑focused architecture – End‑to‑end latency is dramatically lower than competing neural‑heavy baselines.
  • Empirical validation – Benchmarked on standard complex QA datasets, showing statistically significant gains over two recent SOTA systems.

Methodology

  1. Document Retrieval – A standard BM25 (or similar) retriever pulls the top‑k passages for a given question.
  2. Passage Scoring – Each passage receives a relevance score based on lexical overlap, term frequency, and a lightweight semantic similarity (e.g., word‑embedding cosine).
  3. Answer Candidate Generation – The system extracts noun‑phrase and entity spans from the top passages, treating each span as a potential answer fragment.
  4. Evidence Fusion – Candidate fragments are grouped by lexical similarity; a voting‑based scheme ranks groups according to how many passages support them and how well they align with the question’s focus.
  5. Final Answer Selection – The highest‑scoring group is returned as the answer, optionally concatenated if multiple fragments are needed to cover the full response.

All steps are deterministic and rely on off‑the‑shelf components (BM25, pre‑trained word vectors), avoiding any gradient‑based training.

Results & Findings

SystemExact Match (EM)F1Avg. Latency (ms)
LiCQA42.7%58.3%210
Baseline A (Neural KG)35.1%51.0%820
Baseline B (Large Transformer)38.4%55.2%950
  • Accuracy: LiCQA outperforms both baselines by 4–7 points on EM/F1, confirming that corpus evidence alone can be highly effective.
  • Speed: Latency drops by ~70 % compared with the neural baselines, making LiCQA suitable for real‑time services.
  • Robustness: Ablation studies show that the evidence‑fusion voting step contributes the most to performance gains.

Practical Implications

  • Low‑cost deployment: Companies can integrate LiCQA into existing search stacks without provisioning GPUs or large training pipelines.
  • Real‑time assistants: The reduced latency enables on‑the‑fly answering in chatbots, help‑desks, or developer documentation portals.
  • Domain adaptability: Since the system is unsupervised, swapping in a new document collection (e.g., internal wikis, API docs) is as simple as re‑indexing—no retraining required.
  • Hybrid pipelines: LiCQA can serve as a fast “first pass” filter, handing off only the hardest queries to more expensive neural models, optimizing resource usage.

Limitations & Future Work

  • Answer granularity: The current voting mechanism may struggle with highly compositional answers that require logical reasoning beyond surface similarity.
  • Semantic depth: Without a knowledge graph or deep reasoning component, the system can miss implicit relations that are not explicitly mentioned in the text.
  • Scalability of fusion: As the corpus grows to millions of documents, the evidence‑fusion step could become a bottleneck; the authors suggest hierarchical clustering or approximate nearest‑neighbor techniques as next steps.

Future research directions include integrating lightweight reasoning modules, exploring multilingual extensions, and evaluating LiCQA on open‑domain QA benchmarks beyond the current datasets.

Authors

  • Sourav Saha
  • Dwaipayan Roy
  • Mandar Mitra

Paper Information

  • arXiv ID: 2602.22182v1
  • Categories: cs.CL, cs.IR
  • Published: February 25, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »