[Paper] Legal RAG Bench: an end-to-end benchmark for legal RAG
We introduce Legal RAG Bench, a benchmark and evaluation methodology for assessing the end-to-end performance of legal RAG systems. As a benchmark, Legal RAG Be...
We introduce Legal RAG Bench, a benchmark and evaluation methodology for assessing the end-to-end performance of legal RAG systems. As a benchmark, Legal RAG Be...
Large language models (LLMs) have become an essential tool for natural language processing and artificial intelligence in general. Current open-source models ar...
While dense biomedical embeddings achieve strong performance, their black-box nature limits their utility in clinical decision-making. Recent question-based int...
We develop a discrete gauge-theoretic framework for superposition in large language models (LLMs) that replaces the single-global-dictionary premise with a shea...
The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benchmarking...
Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit thi...
Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces significant...
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the u...
Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncert...
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communi...
Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often employed in...
Argumentative LLMs (ArgLLMs) are an existing approach leveraging Large Language Models (LLMs) and computational argumentation for decision-making, with the aim ...