paper — Page 84 | EUNO.NEWS

1 month ago · ai

[Paper] DVGT: Driving Visual Geometry Transformer

Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geomet...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] EasyV2V: A High-quality Instruction-based Video Editing Framework

While image editing has advanced rapidly, video editing remains less explored, facing challenges in consistency, control, and generalization. We study the desig...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] AdaTooler-V: Adaptive Tool-Use for Images and Videos

Recent advances have shown that multimodal large language models (MLLMs) benefit from multimodal interleaved chain-of-thought (CoT) with vision tool interaction...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculation...

#research #paper #ai #machine-learning #nlp
1 month ago · ai

[Paper] StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, pr...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing ...

#research #paper #ai #nlp
1 month ago · ai

[Paper] Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

In this work, we present a panoramic metric depth foundation model that generalizes across diverse scene distances. We explore a data-in-the-loop paradigm from ...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

This paper examines the exploration-exploitation trade-off in reinforcement learning with verifiable rewards (RLVR), a framework for improving the reasoning of ...

#research #paper #ai #machine-learning #nlp
1 month ago · ai

[Paper] Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, ...

#research #paper #ai #machine-learning
1 month ago · ai

[Paper] SFTok: Bridging the Performance Gap in Discrete Tokenizers

Recent advances in multimodal models highlight the pivotal role of image tokenization in high-resolution image generation. By compressing images into compact la...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos

Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] How Good is Post-Hoc Watermarking With Language Model Rephrasing?

Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM...

#research #paper #ai #nlp

Newer posts

Older posts