[Paper] Decision Quality Evaluation Framework at Pinterest
Online platforms require robust systems to enforce content safety policies at scale. A critical component of these systems is the ability to evaluate the qualit...
3646 posts from this source
Online platforms require robust systems to enforce content safety policies at scale. A critical component of these systems is the ability to evaluate the qualit...
Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers...
The Computing Continuum (CC) integrates different layers of processing infrastructure, from Edge to Cloud, to optimize service quality through ubiquitous and re...
Accurate representation of building semantics, encompassing both generic object types and specific subtypes, is essential for effective AI model training in the...
A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science...
Whole-slide images (WSIs) from cancer patients contain rich information that can be used for medical diagnosis or to follow treatment progress. To automate thei...
Due to the rise in the use of renewable energies as an alternative to traditional ones, and especially solar energy, there is increasing interest in studying ho...
The success of Large Language Models (LLMs) has established that scaling compute, through joint increases in model capacity and dataset size, is the primary dri...
Evaluating the quality of automatically generated text often relies on LLM-as-a-judge (LLM-judge) methods. While effective, these approaches are computationally...
In the realm of multi-agent systems, the challenge of partial observability is a critical barrier to effective coordination and decision-making. Existing approa...
Endoscopy is essential in medical imaging, used for diagnosis, prognosis and treatment. Developing a robust dynamic 3D reconstruction pipeline for endoscopic vi...
Current research in multimodal models faces a key challenge where enhancing generative capabilities often comes at the expense of understanding, and vice versa....
Multimodal Large Language Models (mLLMs) are often used to answer questions in structured data such as tables in Markdown, JSON, and images. While these models ...
We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reaso...
With the rapid adoption of large language models (LLMs) in automated code refactoring, assessing and ensuring functional equivalence between LLM-generated refac...
While Multimodal Large Language Models (MLLMs) perform strongly on single-turn chart generation, their ability to support real-world exploratory data analysis r...
Online sexism appears in various forms, which makes its detection challenging. Although automated tools can enhance the identification of sexist content, they a...
This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks. Th...
Low-resource languages pose persistent challenges for Natural Language Processing tasks such as lemmatization and part-of-speech (POS) tagging. This paper inves...
Existing 3D open-vocabulary scene understanding methods mostly emphasize distilling language features from 2D foundation models into 3D feature fields, but larg...
Understanding the causal effects of text on downstream outcomes is a central task in many applications. Estimating such effects requires researchers to run cont...
Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations diff...
Large language models achieve strong performance on many complex reasoning tasks, yet their accuracy degrades sharply on benchmarks that require compositional r...
Business plan (BP) writing plays a key role in entrepreneurship education by helping learners construct, evaluate, and iteratively refine their ideas. However, ...