computer-vision — Page 43

1 month ago · ai

[Paper] Monet: Reasoning in Latent Visual Space Beyond Images and Language

'Thinking with images' has emerged as an effective paradigm for advancing visual reasoning, extending beyond text-only chains of thought by injecting visual evi...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning

Spatio-temporal video grounding (STVG) requires localizing a target object in untrimmed videos both temporally and spatially from natural language descriptions....

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes

Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns wi...

#4D Gaussian Splatting #endoscopic reconstruction #computer vision #depth estimation #real-time rendering
1 month ago · ai

[Paper] PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation

Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding

Recent advances in multimodal large language models (LLMs) have highlighted their potential for medical and surgical applications. However, existing surgical da...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Hybrid SIFT-SNN for Efficient Anomaly Detection of Traffic Flow-Control Infrastructure

This paper presents the SIFT-SNN framework, a low-latency neuromorphic signal-processing pipeline for real-time detection of structural anomalies in transport i...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operat...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs

Traffic cameras are essential in urban areas, playing a crucial role in intelligent transportation systems. Multiple cameras at intersections enhance law enforc...

#research #paper #ai #nlp #computer-vision

Newer posts

Older posts