computer vision — Page 8

Sort:

1 month ago · ai · - · -

[Paper] SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black bo...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] SOMA: Unifying Parametric Human Body Models

Parametric human body models are foundational to human reconstruction, animation, and simulation, yet they remain mutually incompatible: SMPL, SMPL-X, MHR, Anny...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM

Streaming reconstruction from uncalibrated monocular video remains challenging, as it requires both high-precision pose estimation and computationally efficient...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] What DINO saw: ALiBi positional encoding reduces positional bias in Vision Transformers

Vision transformers (ViTs) - especially feature foundation models like DINOv2 - learn rich representations useful for many downstream tasks. However, architectu...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] An assessment of data-centric methods for label noise identification in remote sensing data sets

Label noise in the sense of incorrect labels is present in many real-world data sets and is known to severely limit the generalizability of deep learning models...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines

Immersive extended reality (XR) applications introduce latency-critical workloads that must satisfy stringent real-time responsiveness while operating on energy...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] EvoIQA - Explaining Image Distortions with Evolved White-Box Logic

Traditional Image Quality Assessment (IQA) metrics typically fall into one of two extremes: rigid, hand-crafted mathematical models or 'black-box' deep learning...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Towards Generalizable Robotic Manipulation in Dynamic Environments

Vision-Language-Action (VLA) models excel in static manipulation but struggle in dynamic environments with moving targets. This performance gap primarily stems ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for robotic manipulation, in which reliable action prediction critically depen...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

Generating accurate glyphs for visual text rendering is essential yet challenging. Existing methods typically enhance text rendering by training on a large amou...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion

Recent video diffusion models have made remarkable strides in visual quality, yet precise, fine-grained control remains a key bottleneck that limits practical c...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

We present HSImul3R, a unified framework for simulation-ready 3D reconstruction of human-scene interactions (HSI) from casual captures, including sparse-view im...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

SAM 3D Body (3DB) achieves state-of-the-art accuracy in monocular 3D human mesh recovery, yet its inference latency of several seconds per image precludes real-...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

Accurate process supervision remains a critical challenge for long-horizon robotic manipulation. A primary bottleneck is that current video MLLMs, trained prima...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai · - · -

[Paper] AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer

Existing video-to-audio (V2A) generation methods predominantly rely on text prompts alongside visual information to synthesize audio. However, two critical bott...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Grounding World Simulation Models in a Real-World Metropolis

What if a world simulation model could render not an imagined environment but a city that actually exists? Prior generative world models synthesize visually pla...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Benchmarking Machine Learning Approaches for Polarization Mapping in Ferroelectrics Using 4D-STEM

Four-dimensional scanning transmission electron microscopy (4D-STEM) provides rich, atomic-scale insights into materials structures. However, extracting specifi...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

This Wednesday: March 19 - Vibe Coding Computer Vision Pipelines

!Workshop Bannerhttps://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws....

#computer vision #pipeline #Vibe Coding #Voxel51 #machine learning #image processing
1 month ago · ai · - · -

'Pokémon Go' players unknowingly trained delivery robots with 30B images

markdown !A woman holds up her cell phone as she plays the Pokémon Go game in Lafayette Park in front of the White House in Washington, DC on July 12, 2016.http...

#Pokémon Go #augmented reality #computer vision #robotics #image dataset #delivery robots #AI training #machine learning
1 month ago · ai · - · -

Building Iris: A Real-Time Spatial Awareness Agent with the Gemini Live API

Overview Iris is a real‑time spatial awareness agent that sees through your camera and talks to you. Point your device at anything—a room, a street, a workspac...

#Gemini Live API #real-time AI #spatial awareness #computer vision #voice interaction #accessibility #AI agent
1 month ago · ai · - · -

Learning athletic humanoid tennis skills from imperfect human motion data

Abstract Human athletes demonstrate versatile and highly‑dynamic tennis skills to successfully conduct competitive rallies with a high‑speed tennis ball. Howev...

#humanoid robotics #motion capture #imitation learning #reinforcement learning #computer vision #tennis simulation #human motion data
1 month ago · ai · - · -

[Paper] DualSwinFusionSeg: Multimodal Martian Landslide Segmentation via Dual Swin Transformer with Multi-Scale Fusion and UNet++

Automated segmentation of Martian landslides, particularly in tectonically active regions such as Valles Marineris,is important for planetary geology, hazard as...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Diffusion Reinforcement Learning via Centered Reward Distillation

Diffusion and flow models achieve State-Of-The-Art (SOTA) generative performance, yet many practically important behaviors such as fine-grained prompt fidelity,...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Implementation and discussion of the Pith Estimation on Rough Log End Images using Local Fourier Spectrum Analysis method

In this article, we analyze and propose a Python implementation of the method 'Pith Estimation on Rough Log End images using Local Fourier Spectrum Analysis', b...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Low-Field Magnetic Resonance Image Enhancement using Undersampled k-Space

Low-field magnetic resonance imaging (MRI) offers a cost-effective alternative for medical imaging in resource-limited settings. However, its widespread adoptio...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Low-Field Magnetic Resonance Image Quality Enhancement using Undersampled k-Space and Out-of-Distribution Generalisation

Low-field magnetic resonance imaging (MRI) offers affordable access to diagnostic imaging but faces challenges such as prolonged acquisition times and reduced i...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Improving Visual Reasoning with Iterative Evidence Refinement

Vision language models (VLMs) are increasingly capable of reasoning over images, but robust visual reasoning often requires re-grounding intermediate steps in t...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Revisiting the Perception-Distortion Trade-off with Spatial-Semantic Guided Super-Resolution

Image super-resolution (SR) aims to reconstruct high resolution images with both high perceptual quality and low distortion, but is fundamentally limited by the...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on th...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Representation Learning for Spatiotemporal Physical Systems

Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator f...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Visual-ERM: Reward Modeling for Visual Equivalence

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations wit...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models

Evolutions in the world, such as water pouring or ice melting, happen regardless of being observed. Video world models generate 'worlds' via 2D frame observatio...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Towards Spatio-Temporal World Scene Graph Generation from Monocular Videos

Spatio-temporal scene graphs provide a principled representation for modeling evolving object interactions, yet existing methods remain fundamentally frame-cent...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification

Brain tumor classification from magnetic resonance imaging, which is also known as MRI, plays a sensitive role in computer-assisted diagnosis systems. In recent...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception

In modern human-robot collaboration (HRC) applications, multiple perception modules jointly extract visual, auditory, and contextual cues to achieve comprehensi...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Towards Faithful Multimodal Concept Bottleneck Models

Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied in vision...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

Diffusion-based image compression has recently shown outstanding perceptual fidelity, yet its practicality is hindered by prohibitive sampling overhead and high...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] FDeID-Toolbox: Face De-Identification Toolbox

Face de-identification (FDeID) aims to remove personally identifiable information from facial images while preserving task-relevant utility attributes such as a...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics

Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fundamental...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems

What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitioned Iter...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models

Reinforcement learning (RL) has become a standard technique for post-training diffusion-based image synthesis models, as it enables learning from reward signals...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

I Built an AI Tutor That Actually Sees Your Homework — Here's How

A few weeks ago I was watching my younger cousin struggle through a physics worksheet. She kept typing questions into ChatGPT, getting a wall of text back, and...

#multimodal AI #computer vision #speech synthesis #AI tutoring #education technology #Gemini Live Agent Challenge #homework assistance
1 month ago · ai · - · -

[Paper] Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep Networks

Efficient deep learning traditionally relies on static heuristics like weight magnitude or activation awareness (e.g., Wanda, RIA). While successful in unstruct...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Multimodal Large Language Models (MLLMs) are increasingly used to carry out visual workflows such as navigating GUIs, where the next step depends on verified vi...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

Modern visual agents require representations that are general, causal, and physically structured to operate in real-time streaming environments. However, curren...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Unified multimodal models target joint understanding, reasoning, and generation, but current image editing benchmarks are largely confined to natural images and...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming percept...

#research #paper #ai #computer-vision

Newer posts

Older posts