computer-vision — Page 33

1 month ago · ai

[Paper] Multi-view Pyramid Transformer: Look Coarser to See Broader

We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Storytelling in real-world videos often unfolds through multiple shots -- discontinuous yet semantically connected clips that together convey a coherent narrati...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Distribution Matching Variational AutoEncoder

Most visual generative models compress images into a latent space before applying diffusion or autoregressive modelling. Yet, existing approaches such as VAEs a...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language understanding tasks. While these models often produce ling...

#research #paper #ai #nlp #computer-vision
1 month ago · ai

[Paper] KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models

DreamerV3 is a state-of-the-art online model-based reinforcement learning (MBRL) algorithm known for remarkable sample efficiency. Concurrently, Kolmogorov-Arno...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search

The Lottery Ticket Hypothesis asserts the existence of highly sparse, trainable subnetworks ('winning tickets') within dense, randomly initialized neural networ...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics

The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluation is conducted initially on a non-convex benchmark funct...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai

[Paper] EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aestheti...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

Underwater images often suffer from severe color distortion, low contrast, and a hazy appearance due to wavelength-dependent light absorption and scattering. Si...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

Vision-language models (VLMs) have achieved strong performance in visual question answering (VQA), yet they remain constrained by static training data. Retrieva...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai

[Paper] SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynam...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Grounding is a fundamental capability for building graphical user interface (GUI) agents. Although existing approaches rely on large-scale bounding box supervis...

#research #paper #ai #machine-learning #nlp #computer-vision

Newer posts

Older posts