computer-vision — Page 36

Sort:

3 months ago · ai · - · -

[Paper] Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning

Spatio-temporal video grounding (STVG) requires localizing a target object in untrimmed videos both temporally and spatially from natural language descriptions....

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes

Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns wi...

#4D Gaussian Splatting #endoscopic reconstruction #computer vision #depth estimation #real-time rendering
3 months ago · ai · - · -

[Paper] PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation

Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding

Recent advances in multimodal large language models (LLMs) have highlighted their potential for medical and surgical applications. However, existing surgical da...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Hybrid SIFT-SNN for Efficient Anomaly Detection of Traffic Flow-Control Infrastructure

This paper presents the SIFT-SNN framework, a low-latency neuromorphic signal-processing pipeline for real-time detection of structural anomalies in transport i...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operat...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs

Traffic cameras are essential in urban areas, playing a crucial role in intelligent transportation systems. Multiple cameras at intersections enhance law enforc...

#research #paper #ai #nlp #computer-vision

Newer posts

Older posts