computer-vision — Page 18

Sort:

2 months ago · ai · - · -

[Paper] ObjectForesight: Predicting Future 3D Object Trajectories from Human Videos

Humans can effortlessly anticipate how objects might move or change through interaction--imagining a cup being lifted, a knife slicing, or a lid being closed. W...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Learning Latent Action World Models In The Wild

Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this ...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain Age Predicti...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] MoE3D: A Mixture-of-Experts Module for 3D Reconstruction

MoE3D is a mixture-of-experts module designed to sharpen depth boundaries and mitigate flying-point artifacts (highlighted in red) of existing feed-forward 3D r...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a ...

#research #paper #ai #machine-learning #nlp #computer-vision
2 months ago · ai · - · -

[Paper] Cutting AI Research Costs: How Task-Aware Compression Makes Large Language Model Agents Affordable

When researchers deploy large language models for autonomous tasks like reviewing literature or generating hypotheses, the computational bills add up quickly. A...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Chain-of-thought (CoT) reasoning has emerged as a powerful tool for multimodal large language models on video understanding tasks. However, its necessity and ad...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] CoV: Chain-of-View Prompting for Spatial Reasoning

Embodied question answering (EQA) in 3D environments often requires collecting context that is distributed across multiple viewpoints and partially occluded. Ho...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] A Lightweight and Explainable Vision-Language Framework for Crop Disease Visual Question Answering

Visual question answering for crop disease analysis requires accurate visual understanding and reliable language generation. This work presents a lightweight vi...

#research #paper #ai #nlp #computer-vision
2 months ago · ai · - · -

How to Improve the Performance of Visual Anomaly Detection Models

Apply the best methods from academia to get the most out of practical applications The post How to Improve the Performance of Visual Anomaly Detection Models ap...

#visual anomaly detection #computer vision #model performance #deep learning #anomaly detection
2 months ago · ai · - · -

[Paper] Training a Custom CNN on Five Heterogeneous Image Datasets

Deep learning has transformed visual data analysis, with Convolutional Neural Networks (CNNs) becoming highly effective in learning meaningful feature represent...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

From Pixels to Calories: Building a Multimodal Meal Analysis Engine with GPT-4o

🍝 From Pixels to Calories – Multimodal AI & Automated Calorie Tracking We’ve all been there: staring at a delicious plate of pasta, trying to figure out if it...

#multimodal AI #GPT-4o #computer vision #nutrition analysis #Streamlit
2 months ago · ai · - · -

[Paper] Choreographing a World of Dynamic Objects

Dynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamic...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] ImLoc: Revisiting Visual Localization with Image-based Representation

Existing visual localization methods are typically either 2D image-based, which are easy to build and maintain but limited in effective geometric reasoning, or ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Scanner-Induced Domain Shifts Undermine the Robustness of Pathology Foundation Models

Pathology foundation models (PFMs) have become central to computational pathology, aiming to offer general encoders for feature extraction from whole-slide imag...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] ToTMNet: FFT-Accelerated Toeplitz Temporal Mixing Network for Lightweight Remote Photoplethysmography

Remote photoplethysmography (rPPG) estimates a blood volume pulse (BVP) waveform from facial videos captured by commodity cameras. Although recent deep models i...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Diffusion-DRF: Differentiable Reward Flow for Video Diffusion Fine-Tuning

Direct Preference Optimization (DPO) has recently improved Text-to-Video (T2V) generation by enhancing visual fidelity and text alignment. However, current meth...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Klear: Unified Multi-Task Audio-Video Joint Generation

Audio-video joint generation has progressed rapidly, yet substantial challenges still remain. Non-commercial approaches still suffer audio-visual asynchrony, po...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test

As world models gain momentum in Embodied AI, an increasing number of works explore using video foundation models as predictive world models for downstream embo...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Pixel-Wise Multimodal Contrastive Learning for Remote Sensing Images

Satellites continuously generate massive volumes of data, particularly for Earth observation, including satellite image time series (SITS). However, most deep l...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training

GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents...

#research #paper #ai #machine-learning #nlp #computer-vision
2 months ago · ai · - · -

[Paper] MORPHFED: Federated Learning for Cross-institutional Blood Morphology Analysis

Automated blood morphology analysis can support hematological diagnostics in low- and middle-income countries (LMICs) but remains sensitive to dataset shifts fr...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts

Large Multimodal Models (LMMs) have demonstrated impressive capabilities in video reasoning via Chain-of-Thought (CoT). However, the robustness of their reasoni...

#research #paper #ai #machine-learning #nlp #computer-vision
2 months ago · ai · - · -

[Paper] Better, But Not Sufficient: Testing Video ANNs Against Macaque IT Dynamics

Feedforward artificial neural networks (ANNs) trained on static images remain the dominant models of the the primate ventral visual stream, yet they are intrins...

#research #paper #ai #computer-vision

Newer posts

Older posts