computer-vision — Page 2

Sort:

5 days ago · ai · - · -

[Paper] Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature take...

#research #paper #ai #nlp #computer-vision
5 days ago · ai · - · -

[Paper] Mean Flow Distillation: Robust and Stable Distillation for Flow Matching Models

Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incur...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

Multimodal large language models can write code to produce complex programs as well as use programs to do 3D modeling, which opens up a new avenue for 3D genera...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] MOFA-VTON: More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On

Virtual try-on aims to fit an in-shop clothing image onto a specific human body. An optimal virtual try-on method should provide diverse and flexible dressing o...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors

Most existing deep learning-based PET image denoising methods assume a fixed and known dose reduction factor (DRF) for low-dose PET images. However, these metho...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] WorldOlympiad: Can Your World Model Survive a Triathlon?

We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. W...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Monte Carlo Pass Search: Using Trajectory Generation for 3D Counterfactual Pass Evaluation in Football

We recast pass evaluation in football (soccer) as a Monte Carlo Tree Search (MCTS)-like evaluation problem whose components mostly exist in the literature under...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] Multimodal Brain Tumour Classification Using Feature Fusion

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into ...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

A global shortage of trained sonographers limits prenatal ultrasound screening in low- and middle-income countries, where over half of pregnant women receive no...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

Built on pretrained vision foundation models (VFMs), representation autoencoders (RAEs) have recently emerged as a promising approach for constructing semantica...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] A History-Aware Visually Grounded Critic for Computer Use Agents

Various test-time interventions for Computer Use Agents (CUAs), including critic models, have been developed to improve performance through pre-execution action...

#research #paper #ai #machine-learning #nlp #computer-vision
5 days ago · ai · - · -

[Paper] U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training

Existing deep learning models for Positron Emission Tomography (PET) image denoising often suffer from severe performance degradation under distribution shifts,...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] An Uncertainty Estimation Framework for Dose Accumulation in Adaptive Radiotherapy: Application to CBCT-Guided Radiotherapy for Cervical Cancer

Background and purpose: oART enables daily plan adaptation to interfraction anatomical variations, but cumulative dose estimation remains limited by DIR, segmen...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] IPSM-Bench: A New Intermediate Phase Segmentation Benchmark in Microstructure Images of Zinc-Based Absorbable Biomaterials

Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics....

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] AnimaSpark: A Feed-Forward Method for Animating Arbitrary 3D Objects

While recent advancements in generative AI have substantially accelerated static 3D model creation workflows, the synthesis of category-agnostic 3D animations r...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks

Visual in-context learning has been proposed as a pathway towards dynamic models that can generate predictions based on a provided context and thereby can adapt...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] Latent Spatial Memory for Video World Models

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This des...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models

Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. Howeve...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single firs...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws

Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analy...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] iMaC: Translating Actions into Motion and Contact Images for Embodied World Models

Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodi...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors in...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] Echo-Memory: A Controlled Study of Memory in Action World Models

We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first fram...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] Beyond Spherical Harmonics: Rethinking Appearance Models for Radiance Reconstruction

View-dependent appearance modeling remains a challenging problem in novel-view synthesis and reconstruction. Accurately representing complex angular effects oft...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] End-to-End Optimization of Incoherent Imaging for Classification Under Detector-Limited Readout

End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism char...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] POTATR: A Lightweight Image-to-Graph Model for Page-Level Table Extraction

Large-scale document processing requires contextually aware table extraction (TE) that is both accurate and efficient. Yet current approaches require billions o...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection

Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] Hybrid Robustness Verification for Spatio-Temporal Neural Networks

With AI increasingly deployed in safety-critical systems, providing formal robustness guarantees for the underlying models is essential. Existing verification m...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] HDSL: A Hierarchical Domain-Specific Language for Structured 3D Indoor Scene Generation and Localized Editing with LLM Agents

Text-driven indoor scene generation and editing require an intermediate representation that language models can both produce and revise. Existing LLM-based syst...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles

Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connect...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] Cranio-Diff: Diffusion-based Cross-domain Craniofacial Reconstruction with 2D X-ray Skull Guidance and Structural Identity Constraints

The state-of-the-art generative models, such as CycleGAN, Pix2Pix, and diffusion models have demonstrated remarkable performance in the face generation task. Ho...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] GenEyePose: Patient-Free, Knowledge-Based Saccadic Eye Movement Modeling for Digital Neurophysiologic Biomarker Development

Eye movements, including saccades, are widely regarded as highly sensitive and objective biomarkers of neurophysiologic states. Detecting saccadic signatures in...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] SoccerNet 2026 Player-Centric Ball-Action Spotting:Retraining and Post-Processing Extensions to the FOOTPASS Baselines

We describe our system for the SoccerNet 2026 Player-Centric Ball-Action Spotting Challenge, which requires predicting who performs which action and when, acros...

#research #paper #ai #computer-vision
6 days ago · ai · - · -

[Paper] Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods f...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

[Paper] Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model reli...

#research #paper #ai #nlp #computer-vision
1 week ago · ai · - · -

[Paper] When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA

Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, f...

#research #paper #ai #machine-learning #computer-vision
1 week ago · ai · - · -

[Paper] UniSHARP: Universal Sharp Monocular View Synthesis

In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] UniSHARP: Universal Sharp Monocular View Synthesis

In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...

#research #paper #ai #machine-learning #nlp #computer-vision
1 week ago · ai · - · -

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...

#research #paper #ai #machine-learning #nlp #computer-vision
1 week ago · ai · - · -

[Paper] Streaming Video Generation with Streaming Force Control

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video mo...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Streaming Video Generation with Streaming Force Control

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video mo...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Differences in Detection: Explainability Where it Matters

We propose Differences in Detection (DnD), an intuitive method to compare two object detection models. Based on the same matching algorithm, it complements the ...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Differences in Detection: Explainability Where it Matters

We propose Differences in Detection (DnD), an intuitive method to compare two object detection models. Based on the same matching algorithm, it complements the ...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Implicit Data Synthesis for Contrastive Unsupervised Data Augmentation

Scientific observations generate large quantities of unlabeled data which is laborious to hand-label, making unsupervised learning techniques valuable for proce...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Implicit Data Synthesis for Contrastive Unsupervised Data Augmentation

Scientific observations generate large quantities of unlabeled data which is laborious to hand-label, making unsupervised learning techniques valuable for proce...

#research #paper #ai #computer-vision
1 week ago · ai · - · -

[Paper] Planning-aligned Token Compression for Long-Context Autonomous Driving

Monolithic vision-action models represent an emerging paradigm in autonomous driving. However, this architecture produces token sequences that quickly exceed re...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts