computer-vision — Page 17

Sort:

3 months ago · ai · - · -

[Paper] Motion Attribution for Video Generation

Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Vi...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Reasoning Matters for 3D Visual Grounding

The recent development of Large Language Models (LLMs) with strong reasoning ability has driven research in various domains such as mathematics, coding, and sci...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] S3-CLIP: Video Super Resolution for Person-ReID

Tracklet quality is often treated as an afterthought in most person re-identification (ReID) methods, with the majority of research presenting architectural mod...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Near-perfect photo-ID of the Hula painted frog with zero-shot deep local-feature matching

Accurate individual identification is essential for monitoring rare amphibians, yet invasive marking is often unsuitable for critically endangered species. We e...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] DentalX: Context-Aware Dental Disease Detection with Radiographs

Diagnosing dental diseases from radiographs is time-consuming and challenging due to the subtle nature of diagnostic evidence. Existing methods, which rely on o...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Aggregating Diverse Cue Experts for AI-Generated Image Detection

The rapid emergence of image synthesis models poses challenges to the generalization of AI-generated image detectors. However, existing methods often rely on mo...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Translating Light-Sheet Microscopy Images to Virtual H&E Using CycleGAN

Histopathology analysis relies on Hematoxylin and Eosin (H&E) staining, but fluorescence microscopy offers complementary information. Converting fluorescenc...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding

Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by encouraging step-by-step intermediate reasoning, and recent advances...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Improving Zero-shot ADL Recognition with Large Language Models through Event-based Context and Confidence

Unobtrusive sensor-based recognition of Activities of Daily Living (ADLs) in smart homes by processing data collected from IoT sensing devices supports applicat...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explan...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Tuning-free Visual Effect Transfer across Videos

We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While e...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

While the Transformer architecture dominates many fields, its quadratic self-attention complexity hinders its use in large-scale applications. Linear attention ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] More Images, More Problems? A Controlled Analysis of VLM Failure Modes

Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities, yet their proficiency in understanding and reasoning over multiple images remain...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Exchange Is All You Need for Remote Sensing Change Detection

Remote sensing change detection fundamentally relies on the effective fusion and discrimination of bi-temporal features. Prevailing paradigms typically utilize ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Vision-Language Model for Accurate Crater Detection

The European Space Agency (ESA), driven by its ambitions on planned lunar missions with the Argonaut lander, has a profound interest in reliable crater detectio...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workfl...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training

Recent works such as REPA have shown that guiding diffusion models with external semantic features (e.g., DINO) can significantly accelerate the training of dif...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding

Large Vision-Language Models (LVLMs) face a fundamental dilemma in video reasoning: they are caught between the prohibitive computational costs of verbose reaso...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] On the application of the Wasserstein metric to 2D curves classification

In this work we analyse a number of variants of the Wasserstein distance which allow to focus the classification on the prescribed parts (fragments) of classifi...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Evaluating the encoding competence of visual language models using uncommon actions

We propose UAIT (Uncommon-sense Action Image-Text) dataset, a new evaluation benchmark designed to test the semantic understanding ability of visual language mo...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] SC-MII: Infrastructure LiDAR-based 3D Object Detection on Edge Devices for Split Computing with Multiple Intermediate Outputs Integration

3D object detection using LiDAR-based point cloud data and deep neural networks is essential in autonomous driving technology. However, deploying state-of-the-a...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

Building a Production-Ready Traffic Violation Detection System with Computer Vision

Traffic monitoring and violation detection is a classic computer vision problem that looks deceptively simple but becomes complex very quickly in real‑world con...

#computer vision #traffic monitoring #violation detection #object detection #video analytics #deep learning #object tracking #production deployment
3 months ago · ai · - · -

The Brain of the Future Agent: Why VL-JEPA Matters for Real-World AI

The “Generative” Trap If you have been following AI recently, you know the drill: Input → Generate. - You give ChatGPT, Gemini, or Claude a prompt → it generat...

#VL-JEPA #vision-language models #generative AI #multimodal learning #efficiency in AI #LLM #computer vision
3 months ago · ai · - · -

[Paper] Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints

Deepfake detection systems deployed in real-world environments are subject to adversaries capable of crafting imperceptible perturbations that degrade model per...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation

Deformable multi-contrast image registration is a challenging yet crucial task due to the complex, non-linear intensity relationships across different imaging c...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally int...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation

Domain-generalized retinal vessel segmentation is critical for automated ophthalmic diagnosis, yet faces significant challenges from domain shift induced by non...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Context-Aware Decoding for Faithful Vision-Language Generation

Hallucinations, generating responses inconsistent with the visual input, remain a critical limitation of large vision-language models (LVLMs), especially in ope...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets

Background: Pancreatic cancer is one of the most aggressive cancers, with poor survival rates. Endoscopic ultrasound (EUS) is a key diagnostic modality, but its...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Adapting Vision Transformers to Ultra-High Resolution Semantic Segmentation with Relay Tokens

Current approaches for segmenting ultra high resolution images either slide a window, thereby discarding global context, or downsample and lose fine detail. We ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Phase4DFD: Multi-Domain Phase-Aware Attention for Deepfake Detection

Recent deepfake detection methods have increasingly explored frequency domain representations to reveal manipulation artifacts that are difficult to detect in t...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Bidirectional Channel-selective Semantic Interaction for Semi-Supervised Medical Segmentation

Semi-supervised medical image segmentation is an effective method for addressing scenarios with limited labeled data. Existing methods mainly rely on frameworks...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting

We propose a novel framework for decomposing arbitrarily posed humans into animatable multi-layered 3D human avatars, separating the body and garments. Conventi...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

Silent Plumbing Assistant – A Non-Conversational Retail Intelligence Agent

This is a submission for the Algolia Agent Studio Challengehttps://dev.to/challenges/algolia: Consumer-Facing Non-Conversational Experiences What I Built Silent...

#AI agent #visual search #retail intelligence #product recommendation #non‑conversational AI #computer vision
3 months ago · ai · - · -

[Paper] Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video

We propose Mesh4D, a feed-forward model for monocular 4D mesh reconstruction. Given a monocular video of a dynamic object, our model reconstructs the object's c...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] QNeRF: Neural Radiance Fields on a Simulated Gate-Based Quantum Computer

Recently, Quantum Visual Fields (QVFs) have shown promising improvements in model compactness and convergence speed for learning the provided 2D or 3D signals. ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Nighttime color constancy remains a challenging problem in computational photography due to low-light noise and complex illumination conditions. We present RL-A...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Pixel-Perfect Visual Geometry Estimation

Recovering clean and accurate geometry from images is essential for robotics and augmented reality. However, existing geometry foundation models still suffer se...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration

Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manipulation, yet progress has been constrained by two pe...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation

Referring Expression Segmentation (RES) and Comprehension (REC) respectively segment and detect the object described by an expression, while Referring Expressio...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

The diversity, quantity, and quality of manipulation data are critical for training effective robot policies. However, due to hardware and physical setup constr...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Plenoptic Video Generation

Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view s...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] ObjectForesight: Predicting Future 3D Object Trajectories from Human Videos

Humans can effortlessly anticipate how objects might move or change through interaction--imagining a cup being lifted, a knife slicing, or a lid being closed. W...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Learning Latent Action World Models In The Wild

Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain Age Predicti...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] MoE3D: A Mixture-of-Experts Module for 3D Reconstruction

MoE3D is a mixture-of-experts module designed to sharpen depth boundaries and mitigate flying-point artifacts (highlighted in red) of existing feed-forward 3D r...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a ...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] Cutting AI Research Costs: How Task-Aware Compression Makes Large Language Model Agents Affordable

When researchers deploy large language models for autonomous tasks like reviewing literature or generating hypotheses, the computational bills add up quickly. A...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts