EUNO.NEWS EUNO.NEWS
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
  • All (21181) +146
    • AI (3169) +10
    • DevOps (940) +5
    • Software (11185) +102
    • IT (5838) +28
    • Education (48)
  • Notice
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
Sources Tags Search
한국어 English 中文
  • 3 weeks ago · ai

    [Paper] Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

    Transparent objects remain notoriously hard for perception systems: refraction, reflection and transmission break the assumptions behind stereo, ToF and purely ...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Web World Models

    Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web fra...

    #research #paper #ai #machine-learning #nlp #computer-vision
  • 3 weeks ago · ai

    [Paper] IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

    Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Rec...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion

    Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion sy...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

    Omnimodal large language models have made significant strides in unifying audio and visual modalities; however, they often lack the fine-grained cross-modal und...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception

    Spatio-temporal alignment is crucial for temporal modeling of end-to-end (E2E) perception in autonomous driving (AD), providing valuable structural and textural...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Memorization in 3D Shape Generation: An Empirical Study

    Generative models are increasingly used in 3D vision to synthesize novel shapes, yet it remains unclear whether their generation relies on memorizing training s...

    #research #paper #ai #machine-learning #computer-vision
  • 3 weeks ago · ai

    [Paper] Scalable Residual Feature Aggregation Framework with Hybrid Metaheuristic Optimization for Robust Early Pancreatic Neoplasm Detection in Multimodal CT Imaging

    The early detection of pancreatic neoplasm is a major clinical dilemma, and it is predominantly so because tumors are likely to occur with minimal contrast marg...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Detection Fire in Camera RGB-NIR

    Improving the accuracy of fire detection using infrared night vision cameras remains a challenging task. Previous studies have reported strong performance with ...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature

    The integration of Multimodal Large Language Models (MLLMs) into chemistry promises to revolutionize scientific discovery, yet their ability to comprehend the d...

    #research #paper #ai #machine-learning #computer-vision
  • 3 weeks ago · ai

    [Paper] CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

    Large Language Model (LLM) agents, while proficient in the digital realm, face a significant gap in physical-world deployment due to the challenge of forming an...

    #research #paper #ai #machine-learning #nlp #computer-vision
  • 3 weeks ago · ai

    [Paper] MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

    Multimodal Large Language Models (LLMs) introduce an emerging paradigm for medical imaging by interpreting scans through the lens of extensive clinical knowledg...

    #research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts
EUNO.NEWS
RSS GitHub © 2026