[Paper] DocDancer: Towards Agentic Document-Grounded Information Seeking
Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and la...
Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and la...
Visual question answering for crop disease analysis requires accurate visual understanding and reliable language generation. This work presents a lightweight vi...
Recent advances in language models (LMs) have driven significant progress in various software engineering tasks. However, existing LMs still struggle with compl...
We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph l...
Language models have become effective at a wide range of tasks, from math problem solving to open-domain question answering. However, they still make mistakes, ...
We present LLMberjack, a platform for creating multi-party conversations starting from existing debates, originally structured as reply trees. The system offers...
Large Language Models (LLMs) encode vast amounts of parametric knowledge during pre-training. As world knowledge evolves, effective deployment increasingly depe...
GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents...
Language models often show a preference for using information from specific positions in the input regardless of semantic relevance. While positional bias has b...
Recently, people have suffered and become increasingly aware of the unreliability gap in LLMs for open and knowledge-intensive tasks, and thus turn to search-au...
To mitigate hallucinations in large language models (LLMs), we propose a framework that focuses on errors induced by prompts. Our method extends a chain-style k...
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in video reasoning via Chain-of-Thought (CoT). However, the robustness of their reasoni...