computer-vision — Page 31

Sort:

3 months ago · ai · - · -

[Paper] OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Storytelling in real-world videos often unfolds through multiple shots -- discontinuous yet semantically connected clips that together convey a coherent narrati...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Distribution Matching Variational AutoEncoder

Most visual generative models compress images into a latent space before applying diffusion or autoregressive modelling. Yet, existing approaches such as VAEs a...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language understanding tasks. While these models often produce ling...

#research #paper #ai #nlp #computer-vision
3 months ago · ai · - · -

[Paper] KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models

DreamerV3 is a state-of-the-art online model-based reinforcement learning (MBRL) algorithm known for remarkable sample efficiency. Concurrently, Kolmogorov-Arno...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search

The Lottery Ticket Hypothesis asserts the existence of highly sparse, trainable subnetworks ('winning tickets') within dense, randomly initialized neural networ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics

The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluation is conducted initially on a non-convex benchmark funct...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aestheti...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

Underwater images often suffer from severe color distortion, low contrast, and a hazy appearance due to wavelength-dependent light absorption and scattering. Si...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

Vision-language models (VLMs) have achieved strong performance in visual question answering (VQA), yet they remain constrained by static training data. Retrieva...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynam...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Grounding is a fundamental capability for building graphical user interface (GUI) agents. Although existing approaches rely on large-scale bounding box supervis...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception

Common approaches to explainable AI (XAI) for deep learning focus on analyzing the importance of input features on the classification task in a given model: sal...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition

In this paper, we present a synthesis pipeline and dataset for training / testing data in the task of traffic sign recognition that combines the advantages of d...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Physically-Based Simulation of Automotive LiDAR

We present an analytic model for simulating automotive time-of-flight (ToF) LiDAR that includes blooming, echo pulse width, and ambient light, along with steps ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition

Facial recognition has become a widely used method for authentication and identification, with applications for secure access and locating missing persons. Its ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction

Orthognathic surgery is a crucial intervention for correcting dentofacial skeletal deformities to enhance occlusal functionality and facial aesthetics. Accurate...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly ...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

YOLOv1 Paper Walkthrough: The Day YOLO First Saw the World

A detailed walkthrough of the YOLOv1 architecture and its PyTorch implementation from scratch The post YOLOv1 Paper Walkthrough: The Day YOLO First Saw the Worl...

#YOLOv1 #object detection #computer vision #deep learning #PyTorch #model walkthrough #neural networks
3 months ago · ai · - · -

[Paper] The Universal Weight Subspace Hypothesis

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Mo...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Value Gradient Guidance for Flow Matching Alignment

While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to a...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Deep infant brain segmentation from multi-contrast MRI

Segmentation of magnetic resonance images (MRI) facilitates analysis of human brain development by delineating anatomical structures. However, in infants and yo...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-...

#research #paper #ai #machine-learning #nlp #computer-vision

Newer posts

Older posts