EUNO.NEWS EUNO.NEWS
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
  • All (21181) +146
    • AI (3169) +10
    • DevOps (940) +5
    • Software (11185) +102
    • IT (5838) +28
    • Education (48)
  • Notice
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
Sources Tags Search
한국어 English 中文
  • 1 month ago · ai

    [Paper] PPTArena: A Benchmark for Agentic PowerPoint Editing

    We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast t...

    #research #paper #ai #machine-learning #computer-vision
  • 1 month ago · ai

    [Paper] MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

    Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coh...

    #research #paper #ai #computer-vision
  • 1 month ago · ai

    [Paper] Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

    We investigate whether video generative models can exhibit visuospatial intelligence, a capability central to human cognition, using only visual data. To this e...

    #research #paper #ai #machine-learning #computer-vision
  • 1 month ago · ai

    [Paper] ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

    Despite progress in video-to-audio generation, the field focuses predominantly on mono output, lacking spatial immersion. Existing binaural approaches remain co...

    #research #paper #ai #machine-learning #computer-vision
  • 1 month ago · ai

    [Paper] MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

    We propose MAViD, a novel Multimodal framework for Audio-Visual Dialogue understanding and generation. Existing approaches primarily focus on non-interactive sy...

    #research #paper #ai #computer-vision
  • 1 month ago · ai

    [Paper] SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

    Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversaria...

    #research #paper #ai #machine-learning #computer-vision
  • 1 month ago · ai

    [Paper] Unrolled Networks are Conditional Probability Flows in MRI Reconstruction

    Magnetic Resonance Imaging (MRI) offers excellent soft-tissue contrast without ionizing radiation, but its long acquisition time limits clinical utility. Recent...

    #research #paper #ai #computer-vision
  • 1 month ago · ai

    [Paper] In-Context Sync-LoRA for Portrait Video Editing

    Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, express...

    #research #paper #ai #machine-learning #computer-vision
  • 1 month ago · ai

    [Paper] U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

    Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative fram...

    #research #paper #ai #computer-vision
  • 1 month ago · ai

    [Paper] InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

    Hallucination remains a critical challenge in large language models (LLMs), hindering the development of reliable multimodal LLMs (MLLMs). Existing solutions of...

    #research #paper #ai #computer-vision
  • 1 month ago · ai

    [Paper] Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

    While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing attack metho...

    #research #paper #ai #nlp #computer-vision
  • 1 month ago · ai

    [Paper] BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

    Integrating LiDAR and camera information in the bird's eye view (BEV) representation has demonstrated its effectiveness in 3D object detection. However, because...

    #research #paper #ai #computer-vision

Newer posts

Older posts
EUNO.NEWS
RSS GitHub © 2026