computer-vision — Page 23

Sort:

4 months ago · ai · - · -

[Paper] Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision

The success of foundation models in language and vision motivated research in fully end-to-end robot navigation foundation models (NFMs). NFMs directly map mono...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation pri...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Bidirectional Normalizing Flow: From Data to Noise and Back

Normalizing Flows (NFs) have been established as a principled framework for generative modeling. Standard NFs consist of a forward process and a reverse process...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask ...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

Self-supervised pre-training has revolutionized foundation models for languages, individual 2D images and videos, but remains largely unexplored for learning 3D...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Reinforcement learning (RL), earlier proven to be effective in large language and multi-modal models, has been successfully extended to enhance 2D image generat...

#research #paper #ai #machine-learning #nlp #computer-vision
4 months ago · ai · - · -

[Paper] ClusIR: Towards Cluster-Guided All-in-One Image Restoration

All-in-One Image Restoration (AiOIR) aims to recover high-quality images from diverse degradations within a unified framework. However, existing methods often f...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation

Recent advances in subject-driven video generation with large diffusion models have enabled personalized content synthesis conditioned on user-provided subjects...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Mull-Tokens: Modality-Agnostic Latent Thinking

Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimo...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video wi...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Stronger Normalization-Free Transformers

Although normalization layers have long been viewed as indispensable components of deep learning architectures, the recent introduction of Dynamic Tanh (DyT) ha...

#research #paper #ai #machine-learning #nlp #computer-vision
4 months ago · ai · - · -

[Paper] Any4D: Unified Feed-Forward Metric 4D Reconstruction

We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction. Any4D directly generates per-pixel motion and geomet...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

Interest in Spoor’s bird-monitoring AI software is soaring

Spoor's computer vision software can help wind farms, and other industries, track bird populations and migration patterns....

#computer vision #bird monitoring #wildlife conservation #environmental AI #wind farms #Spoor #migration tracking
4 months ago · ai · - · -

[Paper] GAINS: Gaussian-based Inverse Rendering from Sparse Multi-View Captures

Recent advances in Gaussian Splatting-based inverse rendering extend Gaussian primitives with shading parameters and physically grounded light transport, enabli...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning

Video unified models exhibit strong capabilities in understanding and generation, yet they struggle with reason-informed visual editing even when equipped with ...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Splatent: Splatting Diffusion Latents for Novel View Synthesis

Radiance field representations have recently been explored in the latent space of VAEs that are commonly used by diffusion models. This direction offers efficie...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating

Towards human-robot coexistence, socially aware navigation is significant for mobile robots. Yet existing studies on this area focus mainly on path efficiency a...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway

We present NordFKB, a fine-grained benchmark dataset for geospatial AI in Norway, derived from the authoritative, highly accurate, national Felles KartdataBase ...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] VisualActBench: Can VLMs See and Act like a Human?

Vision-Language Models (VLMs) have achieved impressive progress in perceiving and describing visual environments. However, their ability to proactively reason a...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos

Visual navigation has emerged as a practical alternative to traditional robotic navigation pipelines that rely on detailed mapping and path planning. However, c...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Visual Heading Prediction for Autonomous Aerial Vehicles

The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous syst...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs

Correctly parsing mathematical formulas from PDFs is critical for training large language models and building scientific knowledge bases from academic literatur...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Diffusion Posterior Sampler for Hyperspectral Unmixing with Spectral Variability Modeling

Linear spectral mixture models (LMM) provide a concise form to disentangle the constituent materials (endmembers) and their corresponding proportions (abundance...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI

Pretrained Multimodal Large Language Models (MLLMs) are increasingly deployed in medical AI systems for clinical reasoning, diagnosis support, and report genera...

#research #paper #ai #machine-learning #nlp #computer-vision
4 months ago · software · - · -

Introducing GoCVKit: Zero-Boilerplate Computer Vision in Go

Hey there, fellow Gophers! If you’ve worked with computer vision in Go, you know GoCV is fantastic for accessing OpenCV’s power. But the reality? Boilerplate ev...

#Go #GoCV #computer-vision #OpenCV #framework #real-time #zero-boilerplate #hot-reload #double-buffered #pipeline
4 months ago · ai · - · -

[Paper] ChronusOmni: Improving Time Awareness of Omni Large Language Models

Time awareness is a fundamental ability of omni large language models, especially for understanding long videos and answering complex questions. Previous approa...

#research #paper #ai #nlp #computer-vision
4 months ago · ai · - · -

RoboCrop: Teaching robots how to pick tomatoes

Article URL: https://phys.org/news/2025-12-robocrop-robots-tomatoes.html Comments URL: https://news.ycombinator.com/item?id=46218782 Points: 3 Comments: 0...

#robotics #agricultural automation #computer vision #machine learning #tomato harvesting #AI in farming
4 months ago · ai · - · -

[Paper] SynthPix: A lightspeed PIV images generator

We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented ...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Neuromorphic Eye Tracking for Low-Latency Pupil Detection

Eye tracking for wearable systems demands low latency and milliwatt-level power, but conventional frame-based pipelines struggle with motion blur, high compute ...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge

Nowadays, visual intelligence tools have become ubiquitous, offering all kinds of convenience and possibilities. However, these tools have high computational re...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Astra: General Interactive World Model with Autoregressive Denoising

Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world model...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment

Novel View Synthesis (NVS) has traditionally relied on models with explicit 3D inductive biases combined with known camera parameters from Structure-from-Motion...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper intr...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Unified Diffusion Transformer for High-fidelity Text-Aware Image Restoration

Text-Aware Image Restoration (TAIR) aims to recover high- quality images from low-quality inputs containing degraded textual content. While diffusion models pro...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] LiDAS: Lighting-driven Dynamic Active Sensing for Nighttime Perception

Nighttime environments pose significant challenges for camera-based perception, as existing methods passively rely on the scene lighting. We introduce Lighting-...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Self-Evolving 3D Scene Generation from a Single Image

Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover re...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation

Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

Visual reasoning is challenging, requiring both precise object grounding and understanding complex spatial relationships. Existing methods fall into two camps: ...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Accelerated Rotation-Invariant Convolution for UAV Image Segmentation

Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-sc...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing

Vision-language models (VLMs) are emerging as powerful generalist tools for remote sensing, capable of integrating information across diverse tasks and enabling...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-comput...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] MatteViT: High-Frequency-Aware Document Shadow Removal with Shadow Matte Guidance

Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critica...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge Devices

In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even ...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture

Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-...

#research #paper #ai #nlp #computer-vision
4 months ago · ai · - · -

[Paper] Conditional Morphogenesis: Emergent Generation of Structural Digits via Neural Cellular Automata

Biological systems exhibit remarkable morphogenetic plasticity, where a single genome can encode various specialized cellular structures triggered by local chem...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Voxify3D: Pixel Art Meets Volumetric Rendering

Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting re...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Relational Visual Similarity

Humans do not just see attribute similarity -- we also see relational similarity. An apple is like a peach because both are reddish fruit, but the Earth is also...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts