computer-vision — Page 32

Sort:

3 months ago · ai · - · -

[Paper] Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstructi...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Reward models are critical for aligning vision-language systems with human preferences, yet current approaches suffer from hallucination, weak visual grounding,...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] ShadowDraw: From Any Object to Shadow-Drawing Compositional Art

We introduce ShadowDraw, a framework that transforms ordinary 3D objects into shadow-drawing compositional art. Given a 3D object, our system predicts scene par...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation

All-in-One Image Restoration (AiOIR) tasks often involve diverse degradation that require robust and versatile strategies. However, most existing approaches typ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] TV2TV: A Unified Framework for Interleaved Language and Video Generation

Video generation models are rapidly advancing, but can still struggle with complex video outputs that require significant semantic branching or repeated high-le...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

In recent years, Image Quality Assessment (IQA) for AI-generated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and ar...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

See Through Walls: AI's New Eye on Occluded Motion by Arvind Sundararajan

Ever struggle to get accurate motion capture when hands are intertwined, hidden behind objects, or even just slightly out of view? Standard computer vision syst...

#computer vision #motion capture #occlusion handling #deformable state space model #visual feature extraction #AI research
3 months ago · ai · - · -

[Paper] SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

Normalizing Flows (NFs) learn invertible mappings between the data and a Gaussian distribution. Prior works usually suffer from two limitations. First, they add...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Unique Lives, Shared World: Learning from Single-Life Videos

We introduce the 'single-life' learning paradigm, where we train a distinct vision model exclusively on egocentric videos captured by one individual. We leverag...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances hav...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Radiance Meshes for Volumetric Reconstruction

We introduce radiance meshes, a technique for representing radiance fields with constant density tetrahedral cells produced with a Delaunay tetrahedralization. ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

Vision Language Models (VLMs) demonstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning required for embodied ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Stable Signer: Hierarchical Sign Language Generative Model

Sign Language Production (SLP) is the process of converting the complex input text into a real video. Most previous works focused on the Text2Gloss, Gloss2Pose,...

#research #paper #ai #nlp #computer-vision
3 months ago · ai · - · -

[Paper] RELIC: Interactive Video World Model with Long-Horizon Memory

A truly interactive world model requires three key ingredients: real-time long-horizon streaming, consistent spatial memory, and precise user control. However, ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Fast & Efficient Normalizing Flows and Applications of Image Generative Models

This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying gener...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Jina-VLM: Small Multilingual Vision Language Model

We present Jina-VLM, a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The ...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

Measuring What Matters: Objective Metrics for Image Generation Assessment

Generating high‑quality visuals with state‑of‑the‑art models is becoming increasingly accessible. Open‑source models run on laptops, and cloud services turn tex...

#image generation #evaluation metrics #generative AI #computer vision #quality assessment #Pruna #P-image #AI model benchmarking
3 months ago · ai · - · -

[Paper] PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

Attention mechanisms are the core of foundation models, but their quadratic complexity remains a critical bottleneck for scaling. This challenge has driven the ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] On the Temporality for Sketch Representation Learning

Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced si...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

We propose MagicQuill V2, a novel system that introduces a layered composition paradigm to generative image editing, bridging the gap between the sema...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Multi-view diffusion models have recently emerged as a powerful paradigm for novel view synthesis, yet the underlying mechanism that enables their view-consiste...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] OneThinker: All-in-one Reasoning Model for Image and Video

Reinforcement learning (RL) has recently achieved remarkable success in eliciting visual reasoning within Multimodal Large Language Models (MLLMs). However, exi...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] PPTArena: A Benchmark for Agentic PowerPoint Editing

We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast t...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts