paper — Page 69 | EUNO.NEWS

3 weeks ago · ai

[Paper] LongVideoAgent: Multi-Agent Reasoning with Long Videos

Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods...

#research #paper #ai #machine-learning #computer-vision
3 weeks ago · ai

[Paper] SpatialTree: How Spatial Abilities Branch Out in MLLMs

Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierar...

#research #paper #ai #computer-vision
3 weeks ago · ai

[Paper] Active Intelligence in Video Avatars via Closed-loop World Modeling

Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term g...

#research #paper #ai #computer-vision
3 weeks ago · ai

[Paper] Making Large Language Models Efficient Dense Retrievers

Recent work has shown that directly fine-tuning large language models (LLMs) for dense retrieval yields strong performance, but their substantial parameter coun...

#research #paper #ai #nlp
3 weeks ago · ai

[Paper] FedPOD: the deployable units of training for federated learning

This paper proposes FedPOD (Proportionally Orchestrated Derivative) for optimizing learning efficiency and communication cost in federated learning among multip...

#research #paper #ai #machine-learning #computer-vision
3 weeks ago · ai

[Paper] Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

Neural networks trained with gradient descent often learn solutions of increasing complexity over time, a phenomenon known as simplicity bias. Despite being wid...

#research #paper #ai #machine-learning
3 weeks ago · ai

[Paper] Repurposing Video Diffusion Transformers for Robust Point Tracking

Point tracking aims to localize corresponding points across video frames, serving as a fundamental task for 4D reconstruction, robotics, and video editing. Exis...

#research #paper #ai #computer-vision
3 weeks ago · ai

[Paper] Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many...

#research #paper #ai #machine-learning
3 weeks ago · ai

[Paper] MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts

We present MoE-DiffuSeq, a mixture of experts based framework for enhancing diffusion models in long document generation. Existing diffusion based text generati...

#research #paper #ai #nlp
3 weeks ago · ai

[Paper] Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs

We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark dec...

#research #paper #ai #machine-learning #nlp #computer-vision
3 weeks ago · ai

[Paper] Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information

As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engi...

#research #paper #ai #machine-learning
3 weeks ago · ai

[Paper] Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent

Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity ...

#research #paper #ai #machine-learning #nlp

Newer posts

Older posts