computer vision — Page 10

Sort:

1 month ago · ai · - · -

[Paper] NEGATE: Constrained Semantic Guidance for Linguistic Negation in Text-to-Video Diffusion

Negation is a fundamental linguistic operator, yet it remains inadequately modeled in diffusion-based generative systems. In this work, we present a formal trea...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

Orofacial clefts are among the most common congenital craniofacial abnormalities, yet accurate prenatal detection remains challenging due to the scarcity of exp...

#medical imaging #ultrasound #deep learning #computer vision #radiology education
1 month ago · ai · - · -

City Detect, which uses AI to help cities stay safe and clean, raises $13M Series A

Funding Round City Detect, a vision‑AI startup that helps local governments monitor the health of buildings and neighborhoods, announced a $13 million Series A...

#computer vision #urban tech #startup funding #AI for government
1 month ago · ai · - · -

[Paper] Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups

High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control appro...

#video generation #diffusion models #computer vision #scale-aware conditioning
1 month ago · ai · - · -

[Paper] Accelerating Text-to-Video Generation with Calibrated Sparse Attention

Recent diffusion models enable high-quality video generation, but suffer from slow runtimes. The large transformer-based backbones used in these models are bott...

#text-to-video generation #sparse attention #diffusion models #inference acceleration #computer vision
1 month ago · ai · - · -

[Paper] Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

While datasets for video understanding have scaled to hour-long durations, they typically consist of densely concatenated clips that differ from natural, unscri...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Towards 3D Scene Understanding of Gas Plumes in LWIR Hyperspectral Images Using Neural Radiance Fields

Hyperspectral images (HSI) have many applications, ranging from environmental monitoring to national security, and can be used for material detection and identi...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] RealWonder: Real-Time Physical Action-Conditioned Video Generation

Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

We focus on the task of retrieving nail design images based on dense intent descriptions, which represent multi-layered user intent for nail designs. This is ch...

#multimodal retrieval #computer vision #fashion AI #color palette
1 month ago · ai · - · -

Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time

Microsoft Releases Phi‑4‑reasoning‑vision‑15B Microsoft announced on Tuesday the launch of Phi‑4‑reasoning‑vision‑15B, a compact open‑weight multimodal AI mode...

#Microsoft #Phi-4-reasoning-vision-15B #multimodal AI #large language models #computer vision
1 month ago · ai · - · -

[Paper] SimpliHuMoN: Simplifying Human Motion Prediction

Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been develope...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and π^3 have a computational cost that scales...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

[Paper] Helios: Real Real-Time Long Video Generation Model

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching ...

#video generation #diffusion models #real-time AI #computer vision #Helios
1 month ago · ai · - · -

[Paper] RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation

Pathology report generation remains a relatively under-explored downstream task, primarily due to the gigapixel scale and complex morphological heterogeneity of...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Underrepresented in Foundation Model Pretraining Data? A One-Shot Probe

Large-scale Vision-Language Foundation Models (VLFMs), such as CLIP, now underpin a wide range of computer vision research and applications. VLFMs are often ada...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Enhancing Authorship Attribution with Synthetic Paintings

Attributing authorship to paintings is a historically complex task, and one of its main challenges is the limited availability of real artworks for training com...

#authorship attribution #synthetic data #diffusion models #computer vision #stable diffusion
1 month ago · ai · - · -

[Paper] Hold-One-Shot-Out (HOSO) for Validation-Free Few-Shot CLIP Adapters

In many CLIP adaptation methods, a blending ratio hyperparameter controls the trade-off between general pretrained CLIP knowledge and the limited, dataset-speci...

#few-shot learning #CLIP adapters #validation-free training #computer vision #machine learning
1 month ago · ai · - · -

[Paper] Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Constructing computer-aided design (CAD) models is labor-intensive but essential for engineering and manufacturing. Recent advances in Large Language Models (LL...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

Automate Content Moderation with an NSFW Detection API

!Cover image for Automate Content Moderation with an NSFW Detection APIhttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=a...

#nsfw detection #content moderation #api integration #computer vision #machine learning
1 month ago · ai · - · -

[Paper] Utonia: Toward One Encoder for All Point Clouds

We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] MIBURI: Towards Expressive Interactive Gesture Synthesis

Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language mod...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks are characteriz...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation

Achieving autonomous and versatile whole-body loco-manipulation remains a central barrier to making humanoids practically useful. Yet existing approaches are fu...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human de...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attenti...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction

We present DuoMo, a generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Rec...

#human motion capture #diffusion models #3D reconstruction #computer vision
1 month ago · ai · - · -

[Paper] UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Unified multimodal models have recently demonstrated strong generative capabilities, yet whether and when generation improves understanding remains unclear. Exi...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptibl...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai · - · -

[Paper] HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Adaptive Confidence Regularization for Multimodal Failure Detection

The deployment of multimodal models in high-stakes domains, such as self-driving vehicles and medical diagnostics, demands not only strong predictive performanc...

#multimodal learning #confidence regularization #failure detection #machine learning #computer vision
1 month ago · ai · - · -

[Paper] Sketch2Colab: Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

We present Sketch2Colab, which turns storyboard-style 2D sketches into coherent, object-aware 3D multi-human motion with fine-grained control over agents, joint...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta

The classification of Intangible Cultural Heritage (ICH) images in the Mekong Delta poses unique challenges due to limited annotated data, high visual similarit...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Instruction-based video editing has witnessed rapid progress, yet current methods often struggle with precise visual control, as natural language is inherently ...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] GeoDiT: Point-Conditioned Diffusion Transformer for Satellite Image Synthesis

We introduce GeoDiT, a diffusion transformer designed for text-to-satellite image generation with point-based control. Existing controlled satellite image gener...

#diffusion models #satellite imagery #computer vision #point-conditioned generation #transformer
1 month ago · ai · - · -

[Paper] 3D Field of Junctions: A Noise-Robust, Training-Free Structural Prior for Volumetric Inverse Problems

Volume denoising is a foundational problem in computational imaging, as many 3D imaging inverse problems face high levels of measurement noise. Inspired by the ...

#volumetric inverse problems #training-free prior #3D denoising #CT reconstruction #computer vision
1 month ago · ai · - · -

[Paper] Is Bigger Always Better? Efficiency Analysis in Resource-Constrained Small Object Detection

Scaling laws assume larger models trained on more data consistently outperform smaller ones -- an assumption that drives model selection in computer vision but ...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] OmniRet: Efficient and High-Fidelity Omni Modality Retrieval

Multimodal retrieval is the task of aggregating information from queries across heterogeneous modalities to retrieve desired targets. State-of-the-art multimoda...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

Launch HN: OctaPulse (YC W26) – Robotics and computer vision for fish farming

Introduction Hi HN! My name is Rohan and, together with Paul, I’m the co‑founder of OctaPulse https://www.tryoctapulse.com/. We’re building a robotics layer fo...

#computer vision #aquaculture robotics #fish farming automation #deep learning #OAK camera
1 month ago · ai · - · -

[Paper] UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images

Dense 4D reconstruction from unposed images remains a critical challenge, with current methods relying on slow test-time optimization or fragmented, task-specif...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Mode Seeking meets Mean Seeking for Fast Long Video Generation

Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is s...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Hierarchical Action Learning for Weakly-Supervised Action Segmentation

Humans perceive actions through key transitions that structure actions across multiple abstraction levels, whereas machines, relying on visual features, tend to...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Histopathology Image Normalization via Latent Manifold Compaction

Batch effects arising from technical variations in histopathology staining protocols, scanners, and acquisition pipelines pose a persistent challenge for comput...

#histopathology #image-normalization #latent-manifold-compaction #domain-adaptation #computer-vision
1 month ago · ai · - · -

[Paper] Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution

Diffusion-based Real-World Image Super-Resolution (Real-ISR) achieves impressive perceptual quality but suffers from high computational costs due to iterative s...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy

Modern microscopy routinely produces gigapixel images that contain structures across multiple spatial scales, from fine cellular morphology to broader tissue or...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps....

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume

Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncert...

#research #paper #ai #machine-learning #nlp #computer-vision

Newer posts

Older posts