computer-vision — Page 3

Sort:

4 days ago · ai · - · -

[Paper] FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control appro...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] Accelerating Text-to-Video Generation with Calibrated Sparse Attention

Recent diffusion models enable high-quality video generation, but suffer from slow runtimes. The large transformer-based backbones used in these models are bott...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

While datasets for video understanding have scaled to hour-long durations, they typically consist of densely concatenated clips that differ from natural, unscri...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] Towards 3D Scene Understanding of Gas Plumes in LWIR Hyperspectral Images Using Neural Radiance Fields

Hyperspectral images (HSI) have many applications, ranging from environmental monitoring to national security, and can be used for material detection and identi...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single Token

Hallucinations remain a persistent challenge for vision-language models (VLMs), which often describe nonexistent objects or fabricate facts. Existing detection ...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] EdgeDAM: Real-time Object Tracking for Mobile Devices

Single-object tracking (SOT) on edge devices is a critical computer vision task, requiring accurate and continuous target localization across video frames under...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes

Diffusion Language Models (DLMs) promise highly parallel text generation, yet their practical inference speed is often bottlenecked by suboptimal decoding sched...

#research #paper #ai #computer-vision
4 days ago · ai · - · -

[Paper] RealWonder: Real-Time Physical Action-Conditioned Video Generation

Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding...

#research #paper #ai #machine-learning #computer-vision
4 days ago · ai · - · -

[Paper] NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

We focus on the task of retrieving nail design images based on dense intent descriptions, which represent multi-layered user intent for nail designs. This is ch...

#research #paper #ai #computer-vision
5 days ago · ai · - · -

Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time

Microsoft Releases Phi‑4‑reasoning‑vision‑15B Microsoft announced on Tuesday the launch of Phi‑4‑reasoning‑vision‑15B, a compact open‑weight multimodal AI mode...

#Microsoft #Phi-4-reasoning-vision-15B #multimodal AI #open-weight model #large language models #computer vision #AI reasoning #open-source AI #HuggingFace #GitHub
5 days ago · ai · - · -

[Paper] SimpliHuMoN: Simplifying Human Motion Prediction

Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been develope...

#research #paper #ai #machine-learning #computer-vision
5 days ago · ai · - · -

[Paper] ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and π^3 have a computational cost that scales...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts