computer-vision — Page 35

Sort:

3 months ago · ai · - · -

[Paper] Uncertainty Quantification for Visual Object Pose Estimation

Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics probl...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

Large multimodal models (LMMs) are increasingly adopted as judges in multimodal evaluation systems due to their strong instruction following and consistency wit...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow

Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos and is widely applied in sports, rehabilitation, and skill evaluation....

#action-quality-assessment #causal-inference #video-analysis #computer-vision #long-term-temporal-modeling
3 months ago · ai · - · -

[Paper] Mechanisms of Non-Monotonic Scaling in Vision Transformers

Deeper Vision Transformers often perform worse than shallower ones, which challenges common scaling assumptions. Through a systematic empirical analysis of ViT-...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Qwen3-VL Technical Report

We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benc...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Active Learning for GCN-based Action Recognition

Despite the notable success of graph convolutional networks (GCNs) in skeleton-based action recognition, their performance often depends on large volumes of lab...

#active learning #graph convolutional networks #action recognition #skeleton-based vision #computer vision
3 months ago · ai · - · -

[Paper] ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images

Interactive segmentation models such as the Segment Anything Model (SAM) have demonstrated remarkable generalization on natural images, but perform suboptimally...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training

Video diffusion models achieve strong frame-level fidelity but still struggle with motion coherence, dynamics and realism, often producing jitter, ghosting, or ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Multimodal Robust Prompt Distillation for 3D Point Cloud Models

Adversarial attacks pose a significant threat to learning-based 3D point cloud models, critically undermining their reliability in security-sensitive applicatio...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] UAVLight: A Benchmark for Illumination-Robust 3D Reconstruction in Unmanned Aerial Vehicle (UAV) Scenes

Illumination inconsistency is a fundamental challenge in multi-view 3D reconstruction. Variations in sunlight direction, cloud cover, and shadows break the cons...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Video Generation Models Are Good Latent Reward Models

Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces sign...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects

Bangla Sign Language Translation (BdSLT) has been severely constrained so far as the language itself is very low resource. Standard sentence level dataset creat...

#sign-language #dataset #translation #computer-vision #benchmark
3 months ago · ai · - · -

[Paper] The Age-specific Alzheimer 's Disease Prediction with Characteristic Constraints in Nonuniform Time Span

Alzheimer's disease is a debilitating disorder marked by a decline in cognitive function. Timely identification of the disease is essential for the development ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?

Recent advances in foundation models have shown great promise in domains such as natural language processing and computer vision, and similar efforts are now em...

#ensemble learning #remote sensing #foundation models #computer vision #sustainability
3 months ago · ai · - · -

[Paper] Self-Paced Learning for Images of Antinuclear Antibodies

Antinuclear antibody (ANA) testing is a crucial method for diagnosing autoimmune disorders, including lupus, Sjögren's syndrome, and scleroderma. Despite its im...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Generalized Design Choices for Deepfake Detectors

The effectiveness of deepfake detection methods often depends less on their core design and more on implementation details such as data preprocessing, augmentat...

#deepfake detection #computer vision #benchmarking #model optimization
3 months ago · ai · - · -

[Paper] CanKD: Cross-Attention-based Non-local operation for Feature-based Knowledge Distillation

We propose Cross-Attention-based Non-local Knowledge Distillation (CanKD), a novel feature-based knowledge distillation framework that leverages cross-attention...

#knowledge distillation #cross-attention #computer vision #model compression #deep learning
3 months ago · ai · - · -

[Paper] Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning

We present a novel training approach, named Merge-and-Bound (M&B) for Class Incremental Learning (CIL), which directly manipulates model weights in the para...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Frequency-Aware Token Reduction for Efficient Vision Transformer

Vision Transformers have demonstrated exceptional performance across various computer vision tasks, yet their quadratic computational complexity concerning toke...

#vision transformers #token reduction #frequency-aware pruning #computer vision #model efficiency
3 months ago · ai · - · -

[Paper] MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the subs...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] EvRainDrop: HyperGraph-guided Completion for Effective Frame and Event Stream Aggregation

Event cameras produce asynchronous event streams that are spatially sparse yet temporally dense. Mainstream event representation learning algorithms typically u...

#event cameras #hypergraph neural network #multimodal fusion #computer vision #deep learning
3 months ago · ai · - · -

[Paper] E-M3RF: An Equivariant Multimodal 3D Re-assembly Framework

3D reassembly is a fundamental geometric problem, and in recent years it has increasingly been challenged by deep learning methods rather than classical optimiz...

#equivariant neural networks #multimodal 3D reconstruction #point cloud processing #computer vision
3 months ago · ai · - · -

[Paper] SAM Guided Semantic and Motion Changed Region Mining for Remote Sensing Change Captioning

Remote sensing change captioning is an emerging and popular research task that aims to describe, in natural language, the content of interest that has changed b...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Monet: Reasoning in Latent Visual Space Beyond Images and Language

'Thinking with images' has emerged as an effective paradigm for advancing visual reasoning, extending beyond text-only chains of thought by injecting visual evi...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts