EUNO.NEWS EUNO.NEWS
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
  • All (21181) +146
    • AI (3169) +10
    • DevOps (940) +5
    • Software (11185) +102
    • IT (5838) +28
    • Education (48)
  • Notice
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
Sources Tags Search
한국어 English 中文
  • 3 weeks ago · ai

    [Paper] GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

    Modern deep learning methods typically treat image sequences as large tensors of sequentially stacked frames. However, is this straightforward representation id...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks

    In hard-label black-box adversarial attacks, where only the top-1 predicted label is accessible, the prohibitive query complexity poses a major obstacle to prac...

    #research #paper #ai #machine-learning #computer-vision
  • 3 weeks ago · ai

    [Paper] SemanticGen: Video Generation in Semantic Space

    State-of-the-art video generative models typically learn the distribution of video latents in the VAE space and map them to pixels using a VAE decoder. While th...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] LongVideoAgent: Multi-Agent Reasoning with Long Videos

    Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods...

    #research #paper #ai #machine-learning #computer-vision
  • 3 weeks ago · ai

    [Paper] SpatialTree: How Spatial Abilities Branch Out in MLLMs

    Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierar...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Active Intelligence in Video Avatars via Closed-loop World Modeling

    Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term g...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] FedPOD: the deployable units of training for federated learning

    This paper proposes FedPOD (Proportionally Orchestrated Derivative) for optimizing learning efficiency and communication cost in federated learning among multip...

    #research #paper #ai #machine-learning #computer-vision
  • 3 weeks ago · ai

    [Paper] Repurposing Video Diffusion Transformers for Robust Point Tracking

    Point tracking aims to localize corresponding points across video frames, serving as a fundamental task for 4D reconstruction, robotics, and video editing. Exis...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs

    We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark dec...

    #research #paper #ai #machine-learning #nlp #computer-vision
  • 3 weeks ago · ai

    [Paper] LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

    Simulators can generate virtually unlimited driving data, yet imitation learning policies in simulation still struggle to achieve robust closed-loop performance...

    #research #paper #ai #machine-learning #computer-vision
  • 3 weeks ago · ai

    [Paper] FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

    Large vision-language models (VLMs) typically process hundreds or thousands of visual tokens per image or video frame, incurring quadratic attention cost and su...

    #research #paper #ai #computer-vision
  • 3 weeks ago · ai

    [Paper] Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

    Vision-language models (VLM) excel at general understanding yet remain weak at dynamic spatial reasoning (DSR), i.e., reasoning about the evolvement of object g...

    #research #paper #ai #computer-vision

Newer posts

Older posts
EUNO.NEWS
RSS GitHub © 2026