research — Page 127

Sort:

3 months ago · ai · - · -

[Paper] CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Multi-view diffusion models have recently emerged as a powerful paradigm for novel view synthesis, yet the underlying mechanism that enables their view-consiste...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] OneThinker: All-in-one Reasoning Model for Image and Video

Reinforcement learning (RL) has recently achieved remarkable success in eliciting visual reasoning within Multimodal Large Language Models (MLLMs). However, exi...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] PPTArena: A Benchmark for Agentic PowerPoint Editing

We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast t...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coh...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

We investigate whether video generative models can exhibit visuospatial intelligence, a capability central to human cognition, using only visual data. To this e...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

Despite progress in video-to-audio generation, the field focuses predominantly on mono output, lacking spatial immersion. Existing binaural approaches remain co...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Learning Physically Consistent Lagrangian Control Models Without Acceleration Measurements

This article investigates the modeling and control of Lagrangian systems involving non-conservative forces using a hybrid method that does not require accelerat...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

We propose MAViD, a novel Multimodal framework for Audio-Visual Dialogue understanding and generation. Existing approaches primarily focus on non-interactive sy...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversaria...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models

The rapid advancement and adaptability of Large Language Models (LLMs) highlight the need for moral consistency, the capacity to maintain ethically coherent rea...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] LORE: A Large Generative Model for Search Relevance

Achievement. We introduce LORE, a systematic framework for Large Generative Model-based relevance in e-commerce search. Deployed and iterated over three years, ...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] TokenPowerBench: Benchmarking the Power Consumption of LLM Inference

Large language model (LLM) services now answer billions of queries per day, and industry reports show that inference, not training, accounts for more than 90% o...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Unrolled Networks are Conditional Probability Flows in MRI Reconstruction

Magnetic Resonance Imaging (MRI) offers excellent soft-tissue contrast without ionizing radiation, but its long acquisition time limits clinical utility. Recent...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge

Thinking Large Language Models (LLMs) used as judges for pairwise preferences remain noisy at the single-sample level, and common aggregation rules (majority vo...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] In-Context Sync-LoRA for Portrait Video Editing

Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, express...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?

The rapid advancement of large language models (LLMs) has opened new possibilities for AI for good applications. As LLMs increasingly mediate online communicati...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Fine-Tuned Large Language Models for Logical Translation: Reducing Hallucinations with Lang2Logic

Recent advances in natural language processing (NLP), particularly large language models (LLMs), have motivated the automatic translation of natural language st...

#research #paper #ai #machine-learning #nlp
3 months ago · ai · - · -

[Paper] ProteinPNet: Prototypical Part Networks for Concept Learning in Spatial Proteomics

Understanding the spatial architecture of the tumor microenvironment (TME) is critical to advance precision oncology. We present ProteinPNet, a novel framework ...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative fram...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

Hallucination remains a critical challenge in large language models (LLMs), hindering the development of reliable multimodal LLMs (MLLMs). Existing solutions of...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Rethinking Generalized BCIs: Benchmarking 340,000+ Unique Algorithmic Configurations for EEG Mental Command Decoding

Robust decoding and classification of brain patterns measured with electroencephalography (EEG) remains a major challenge for real-world (i.e. outside scientifi...

#research #paper #ai #machine-learning
3 months ago · ai · - · -

[Paper] Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing attack metho...

#research #paper #ai #nlp #computer-vision
3 months ago · ai · - · -

[Paper] BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

Integrating LiDAR and camera information in the bird's eye view (BEV) representation has demonstrated its effectiveness in 3D object detection. However, because...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Flexible Gravitational-Wave Parameter Estimation with Transformers

Gravitational-wave data analysis relies on accurate and efficient methods to extract physical information from noisy detector signals, yet the increasing rate a...

#research #paper #ai #machine-learning

Newer posts

Older posts