computer vision — Page 15

Sort:

2 months ago · ai · - · -

[Paper] Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity

The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biase...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

Building Intelligent Retail Signage with BrightSign's NPU: A Deep Dive into Real-Time Gaze Detection

Overview Every Series 5 BrightSign player except the XC line ships with an onboard Neural Processing Unit NPU. Most people don’t know this, and even fewer have...

#NPU #computer-vision #gaze-detection #real-time-inference #BrightSign
2 months ago · ai · - · -

[Paper] Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token Pruning

Multimodal Large Language Models (MLLMs) suffer from severe training inefficiency issue, which is associated with their massive model sizes and visual token num...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Progressive Checkerboards for Autoregressive Multiscale Image Generation

A key challenge in autoregressive image generation is to efficiently sample independent locations in parallel, while still modeling mutual dependencies with ser...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to ...

#research #paper #ai #nlp #computer-vision
2 months ago · ai · - · -

[Paper] 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Existing methods for human motion control in video generation typically rely on either 2D poses or explicit 3D parametric models (e.g., SMPL) as control signals...

#video generation #motion representation #computer vision #3D modeling #implicit neural representations
2 months ago · ai · - · -

[Paper] FOVI: A biologically-inspired foveated interface for deep vision models

Human vision is foveated, with variable resolution peaking at the center of a large field of view; this reflects an efficient trade-off for active sensing, allo...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

Stereo Matching Algorithms in MATLAB and Python

Stereo matching is a core problem in computer vision, and performance matters, especially when working with large images or real‑time systems. This post shares...

#stereo matching #computer vision #MATLAB #Python #block matching #semi-global matching #belief propagation #algorithm implementation
2 months ago · ai · - · -

[Paper] PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent ...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Multi-head automated segmentation by incorporating detection head into the contextual layer neural network

Deep learning based auto segmentation is increasingly used in radiotherapy, but conventional models often produce anatomically implausible false positives, or h...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] MentisOculi: Revealing the Limits of Reasoning with Mental Imagery

Frontier models are transitioning from multimodal large language models (MLLMs) that merely ingest visual information to unified multimodal models (UMMs) capabl...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval

Reranking is a critical component of modern retrieval systems, which typically pair an efficient first-stage retriever with a more expressive model to refine re...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] SelvaMask: Segmenting Trees in Tropical Forests and Beyond

Tropical forests harbor most of the planet's tree biodiversity and are critical to global ecological balance. Canopy trees in particular play a disproportionate...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Catalyst: Out-of-Distribution Detection via Elastic Scaling

Out-of-distribution (OOD) detection is critical for the safe deployment of deep neural networks. State-of-the-art post-hoc methods typically derive OOD scores f...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] ReasonEdit: Editing Vision-Language Models using Human Reasoning

Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language model...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation

Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by enviro...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Superman: Unifying Skeleton and Vision for Human Motion Perception and Generation

Human motion analysis tasks, such as temporal 3D pose estimation, motion prediction, and motion in-betweening, play an essential role in computer vision. Howeve...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

Carbon Robotics built an AI model that detects and identifies plants

Carbon Robotics' Large Plant Model will allow farmers to kill new types of weeds without having to retrain the machines....

#AI #computer vision #plant identification #weed control #agricultural robotics #Carbon Robotics #large plant model
2 months ago · ai · - · -

[Paper] Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data

Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views. Although nu...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] One Size, Many Fits: Aligning Diverse Group-Wise Click Preferences in Large-Scale Advertising Image Generation

Advertising image generation has increasingly focused on online metrics like Click-Through Rate (CTR), yet existing approaches adopt a ``one-size-fits-all' stra...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Rethinking Genomic Modeling Through Optical Character Recognition

Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequen...

#research #paper #ai #machine-learning #nlp #computer-vision
2 months ago · ai · - · -

[Paper] UniDriveDreamer: A Single-Stage Multimodal World Model for Autonomous Driving

World models have demonstrated significant promise for data synthesis in autonomous driving. However, existing methods predominantly concentrate on single-modal...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors

Reconstructing 3D scenes from sparse images remains a challenging task due to the difficulty of recovering accurate geometry and texture without optimization. R...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

How to Turn Raw Product Photos Into Studio-Quality Images with AI

Struggling with dull, poorly lit raw product photos that don’t sell? In 2026, AI product‑photography enhancement lets you turn raw shots into studio‑quality ima...

#AI image enhancement #product photography #e‑commerce visuals #Olio AI #computer vision #studio‑quality images
2 months ago · ai · - · -

[Paper] VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulti...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchm...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Denoising the Deep Sky: Physics-Based CCD Noise Formation for Astronomical Imaging

Astronomical imaging remains noise-limited under practical observing constraints, while standard calibration pipelines mainly remove structured artifacts and le...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] PaperBanana: Automating Academic Illustration for AI Scientists

Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck i...

#research #paper #ai #nlp #computer-vision
2 months ago · ai · - · -

[Paper] Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models

Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computati...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Structured Over Scale: Learning Spatial Reasoning from Educational Video

Vision-language models (VLMs) demonstrate impressive performance on standard video understanding benchmarks yet fail systematically on simple reasoning tasks th...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

In recent years, large language models (LLMs) have made rapid progress in information retrieval, yet existing research has mainly focused on text or static mult...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Existing multimodal large language models for long-video understanding predominantly rely on uniform sampling and single-turn inference, limiting their ability ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Region-Normalized DPO for Medical Image Segmentation under Noisy Judges

While dense pixel-wise annotations remain the gold standard for medical image segmentation, they are costly to obtain and limit scalability. In contrast, many d...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training

Despite recent Multimodal Large Language Models (MLLMs)' linguistic prowess in medical diagnosis, we find even state-of-the-art MLLMs suffer from a critical per...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks

Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. Howev...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] One-step Latent-free Image Generation with Pixel Mean Flows

Modern diffusion/flow-based models for image generation typically exhibit two core characteristics: (i) using multi-step sampling, and (ii) operating in a laten...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] UEval: A Benchmark for Unified Multimodal Generation

We introduce UEval, a benchmark to evaluate unified models, i.e., models capable of generating both images and text. UEval comprises 1,000 expert-curated questi...

#research #paper #ai #nlp #computer-vision
2 months ago · ai · - · -

[Paper] DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Manipulating dynamic objects remains an open challenge for Vision-Language-Action (VLA) models, which, despite strong generalization in static manipulation, str...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Do VLMs Perceive or Recall? Probing Visual Perception vs. Memory with Classic Visual Illusions

Large Vision-Language Models (VLMs) often answer classic visual illusions 'correctly' on original images, yet persist with the same responses when illusion fact...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Audio-Visual Foundation Models, which are pretrained to jointly generate sound and visual content, have recently shown an unprecedented ability to model multi-m...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

In pruning, the Lottery Ticket Hypothesis posits that large networks contain sparse subnetworks, or winning tickets, that can be trained in isolation to match t...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] PI-Light: Physics-Inspired Diffusion for Full-Image Relighting

Full-image relighting remains a challenging problem due to the difficulty of collecting large-scale structured paired data, the difficulty of maintaining physic...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Early and Prediagnostic Detection of Pancreatic Cancer from Computed Tomography

Pancreatic ductal adenocarcinoma (PDAC), one of the deadliest solid malignancies, is often detected at a late and inoperable stage. Retrospective reviews of pre...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers

Current generative video models excel at producing novel content from text and image prompts, but leave a critical gap in editing existing pre-recorded videos, ...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Creative Image Generation with Diffusion Model

Creative image generation has emerged as a compelling area of research, driven by the need to produce novel and high-quality images that expand the boundaries o...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

Beyond Just a Photo: Building a Pixel-Perfect Calorie Estimator with SAM and GPT-4o

We've all been there: staring at a delicious plate of pasta, trying to manually log every gram into a fitness app. It’s tedious, prone to 'optimistic' human err...

#segment-anything #gpt-4o #computer-vision #multimodal #fastapi
2 months ago · ai · - · -

Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses

Agentic Vision is a new capability for the Gemini 3 Flash model to make image-related tasks more accurate by “grounding answers in visual evidence.” more…...

#Gemini 3 Flash #Agentic Vision #multimodal AI #computer vision #Google AI

Newer posts

Older posts