computer-vision

Sort:

13 hours ago · ai · - · -

[Paper] Seeing Fast and Slow: Learning the Flow of Time in Videos

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern com...

#research #paper #ai #machine-learning #computer-vision
13 hours ago · ai · - · -

[Paper] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safe...

#research #paper #ai #computer-vision
13 hours ago · ai · - · -

[Paper] Context Unrolling in Omni Models

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We ...

#research #paper #ai #computer-vision
14 hours ago · ai · - · -

[Paper] Vista4D: Video Reshooting with 4D Point Clouds

We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an...

#research #paper #ai #computer-vision
14 hours ago · ai · - · -

[Paper] When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are n...

#research #paper #ai #machine-learning #nlp #computer-vision
14 hours ago · ai · - · -

[Paper] Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision

Humans and modern vision models can reach similar classification accuracy while making systematically different kinds of mistakes - differing not in how often t...

#research #paper #ai #computer-vision
14 hours ago · ai · - · -

[Paper] UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, develo...

#research #paper #ai #computer-vision
14 hours ago · ai · - · -

[Paper] Addressing Image Authenticity When Cameras Use Generative AI

The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Inte...

#research #paper #ai #machine-learning #computer-vision
1 day ago · ai · - · -

[Paper] Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views...

#research #paper #ai #machine-learning #computer-vision
1 day ago · ai · - · -

Flow Map Learning via Nongradient Vector Flow [pdf]

I’m unable to convert the article because the provided content is a binary PDF stream rather than extractable text. Please supply the article’s text for example...

#flow map learning #vector flow #machine learning #computer vision #non‑gradient methods #research paper
1 day ago · ai · - · -

[Paper] DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categor...

#research #paper #ai #computer-vision
1 day ago · ai · - · -

[Paper] FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can seve...

#research #paper #ai #machine-learning #computer-vision
1 day ago · ai · - · -

[Paper] Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and ...

#research #paper #ai #machine-learning #computer-vision
1 day ago · ai · - · -

[Paper] ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Reinforcement Learning (RL) post-training has become the standard for aligning generative models with human preferences, yet most methods rely on a single scala...

#research #paper #ai #machine-learning #computer-vision
1 day ago · ai · - · -

[Paper] Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning

Transformer-based OCR models have shown strong performance on Latin and CJK scripts, but their application to African syllabic writing systems remains limited. ...

#research #paper #ai #computer-vision
1 day ago · ai · - · -

[Paper] OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal re...

#research #paper #ai #machine-learning #nlp #computer-vision
1 day ago · ai · - · -

[Paper] LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the sub...

#research #paper #ai #machine-learning #computer-vision
1 day ago · ai · - · -

[Paper] LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integr...

#research #paper #ai #computer-vision
1 day ago · ai · - · -

[Paper] GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction

Reconstructing dynamic 3D scenes from sparse multi-view videos is highly ill-posed, often leading to geometric collapse, trajectory drift, and floating artifact...

#research #paper #ai #computer-vision
1 day ago · ai · - · -

[Paper] Exploring High-Order Self-Similarity for Video Understanding

Space-time self-similarity (STSS), which captures visual correspondences across frames, provides an effective way to represent temporal dynamics for video under...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusio...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] CityRAG: Stepping Into a City via Spatially-Grounded Video Generation

We address the problem of generating a 3D-consistent, navigable environment that is spatially grounded: a simulation of a real location. Existing video generati...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] Generalization at the Edge of Stability

Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory an...

#research #paper #ai #machine-learning #computer-vision
2 days ago · ai · - · -

[Paper] Generative Drifting for Conditional Medical Image Generation

Conditional medical image generation plays an important role in many clinically relevant imaging tasks. However, existing methods still face a fundamental chall...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the ac...

#research #paper #ai #machine-learning #computer-vision
2 days ago · ai · - · -

[Paper] ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view dat...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] A Network-Aware Evaluation of Distributed Energy Resource Control in Smart Distribution Systems

Distribution networks with high penetration of Distributed Energy Resources (DERs) increasingly rely on communication networks to coordinate grid-interactive co...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] Face Anything: 4D Face Reconstruction from Any Image Sequence

Accurate reconstruction and tracking of dynamic human faces from image sequences is challenging because non-rigid deformations, expression changes, and viewpoin...

#research #paper #ai #computer-vision
2 days ago · ai · - · -

[Paper] Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllabil...

#research #paper #ai #machine-learning #nlp #computer-vision
3 days ago · ai · - · -

[Paper] MUA: Mobile Ultra-detailed Animatable Avatars

Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable ava...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] ReCap: Lightweight Referential Grounding for Coherent Story Visualization

Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, a...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge towar...

#research #paper #ai #machine-learning #computer-vision
3 days ago · ai · - · -

[Paper] MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-c...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmenta...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in ...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] Advancing Vision Transformer with Enhanced Spatial Priors

In recent years, the Vision Transformer (ViT) has garnered significant attention within the computer vision community. However, the core component of ViT, Self-...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation

The rapid progress of subject-driven text-to-image synthesis, and in particular DreamBooth, has enabled a consent-free deepfake pipeline: an adversary needs onl...

#research #paper #ai #computer-vision
3 days ago · ai · - · -

[Paper] UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcemen...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation

A robust Multimodal Large Language Model (MLLM) for Earth Observation should maintain consistent interpretation and reasoning under realistic input variations. ...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Enhancing Zero-shot Personalized Image Aesthetics Assessment with Profile-aware Multimodal LLM

Personalized image aesthetics assessment (PIAA) aims to predict an individual user's subjective rating of an image, which requires modeling user-specific aesthe...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly

Unrecovered e-waste represents a significant economic loss. Hard disk drives (HDDs) comprise a valuable e-waste stream necessitating robotic disassembly. Automa...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

[Paper] Region-Affinity Attention for Whole-Slide Breast Cancer Classification in Deep Ultraviolet Imaging

Breast cancer diagnosis demands rapid and precise tools, yet traditional histopathological methods often fall short in intra-operative settings. Deep Ultraviole...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] Cross-Modal Attention Analysis and Optimization in Vision-Language Models: A Study on Visual Reliability

Vision-Language Models (VLMs) achieve strong cross-modal performance, yet recent evidence suggests they over-rely on textual descriptions while under-utilizing ...

#research #paper #ai #machine-learning #computer-vision
6 days ago · ai · - · -

From Pixels to Predictions: Data Pipelines and Training the Sequence Model (Part 2)

Introduction In Part 1 of this series we introduced the architecture of the ASL‑to‑voice translation system—a five‑stage pipeline that turns real‑time webcam v...

#sign language #computer vision #data pipelines #sequence model #neural networks #video processing #machine learning #deep learning #datasets #asl-to-voice
6 days ago · ai · - · -

[Paper] Repurposing 3D Generative Model for Autoregressive Layout Generation

We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual ...

#research #paper #ai #computer-vision

Newer posts

Older posts