computer-vision — Page 9

Sort:

1 week ago · ai · - · -

[Paper] CoLoGen: Progressive Learning of Concept`-`Localization Duality for Unified Image Generation

Unified conditional image generation remains difficult because different tasks depend on fundamentally different internal representations. Some require conceptu...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

Object hallucination is a critical issue in Large Vision-Language Models (LVLMs), where outputs include objects that do not appear in the input image. A natural...

#research #paper #ai #machine-learning #nlp #computer-vision
1 week ago · ai · - · -

[Paper] MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Medical vision-language pretraining increasingly relies on medical reports as large-scale supervisory signals; however, raw reports often exhibit substantial st...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] WeaveTime: Stream from Earlier Frames into Emergent Memory in VideoLLMs

Recent advances in Multimodal Large Language Models have greatly improved visual understanding and reasoning, yet their quadratic attention and offline training...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

Visual imitation learning: Guidde trains AI agents on human 'expert video' instead of documentation

'Cleaned Markdown Version

#visual imitation learning #agentic AI #screen recording training #enterprise automation #computer vision #AI agents #imitation learning
1 week ago · ai · - · -

[Paper] Test-Time Training with KV Binding Is Secretly Linear Attention

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping...

#research #paper #ai #machine-learning #computer-vision
1 week ago · ai · - · -

[Paper] Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics

Visual reinforcement learning is appealing for robotics but expensive -- off-policy methods are sample-efficient yet slow; on-policy methods parallelize well bu...

#research #paper #ai #machine-learning #computer-vision
1 week ago · ai · - · -

[Paper] Multi-Vector Index Compression in Any Modality

We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in...

#research #paper #ai #nlp #computer-vision
1 week ago · ai · - · -

[Paper] Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent ...

#research #paper #ai #machine-learning #nlp #computer-vision
1 week ago · ai · - · -

[Paper] Region of Interest Segmentation and Morphological Analysis for Membranes in Cryo-Electron Tomography

Cryo-electron tomography (cryo-ET) enables high resolution, three-dimensional reconstruction of biological structures, including membranes and membrane proteins...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Human Video Generation from a Single Image with 3D Pose and View Control

Recent diffusion methods have made significant progress in generating videos from single images due to their powerful visual generation capabilities. However, c...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning

While Vision-Language Models (VLMs) exhibit exceptional 2D visual understanding, their ability to comprehend and reason about 3D space--a cornerstone of spatial...

#research #paper #ai #computer-vision

Newer posts

Older posts