computer vision — Page 13

Sort:

2 months ago · ai · - · -

[Paper] Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment

We introduce Web-Scale Multimodal Summarization, a lightweight framework for generating summaries by combining retrieved text and image data from web sources. G...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reaso...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Revisiting the Platonic Representation Hypothesis: An Aristotelian View

The Platonic Representation Hypothesis suggests that representations from neural networks are converging to a common statistical model of reality. We show that ...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

Living neurons integrated into modern AI processing, claims SF startup — biological computing power used to boost computer vision, generative video, and more

Introduction A San Francisco‑based startup claims to be the first to create a biological computing platform built from living neuronshttps://www.tomshardware.c...

#biological computing #neuromorphic hardware #AI acceleration #computer vision #generative video
2 months ago · ai · - · -

Haar Cascades to YOLO: Face Detection Migration Guide

The 15‑Year‑Old Code That Still Runs in Production Haar Cascades are everywhere. If you've ever used OpenCV's face detector, you've used a method published in...

#face detection #Haar Cascades #YOLO #OpenCV #computer vision #model migration #deep learning
2 months ago · ai · - · -

Aided by AI, California beach town broadens hunt for bike lane blockers

Overview This spring, a Southern California beach town will become the first city in the country where municipal parking‑enforcement vehicles use an AI system...

#computer-vision #smart-city #bike-lane enforcement #municipal AI #parking enforcement
2 months ago · ai · - · -

[Paper] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here,...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

Conversational image segmentation grounds abstract, intent-driven concepts into pixel-accurate masks. Prior work on referring image grounding focuses on categor...

#image segmentation #computer vision #large language models #dataset #research paper
2 months ago · ai · - · -

[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos. To fit to the maximum context window constraint, current methods ...

#research #paper #ai #machine-learning #nlp #computer-vision
2 months ago · ai · - · -

[Paper] FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace

To validate a clinically accessible approach for quantifying the Upper Extremity Reachable Workspace (UERW) using a single (monocular) camera and Artificial Int...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] LongStream: Long-Sequence Streaming Autoregressive Visual Geometry

Long-sequence streaming 3D reconstruction remains a significant open challenge. Existing autoregressive models often fail when processing long sequences. They t...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

With the advancement of face recognition (FR) systems, privacy-preserving face recognition (PPFR) systems have gained popularity for their accurate recognition,...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Universal Transformation of One-Class Classifiers for Unsupervised Anomaly Detection

Detecting anomalies in images and video is an essential task for multiple real-world problems, including industrial inspection, computer-assisted diagnosis, and...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] SIEFormer: Spectral-Interpretable and -Enhanced Transformer for Generalized Category Discovery

This paper presents a novel approach, Spectral-Interpretable and -Enhanced Transformer (SIEFormer), which leverages spectral analysis to reinterpret the attenti...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition

Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventional visible...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

As self-driving technology advances toward widespread adoption, determining safe operational thresholds across varying environmental conditions becomes critical...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iterati...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] MonarchRT: Efficient Attention for Real-Time Video Generation

Real-time video generation with Diffusion Transformers is bottlenecked by the quadratic cost of 3D self-attention, especially in real-time regimes that are both...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Supervised fine-tuning (SFT) is computationally efficient but often yields inferior generalization compared to reinforcement learning (RL). This gap is primaril...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Current unified multimodal models for image generation and editing typically rely on massive parameter scales (e.g., >10B), entailing prohibitive training co...

#multimodal-model #image-generation #diffusion-transformer #deep-learning #computer-vision
2 months ago · ai · - · -

[Paper] TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

Beginning autonomous operations with the 6th-generation Waymo Driver

Waymo will begin fully autonomous operations with its 6th‑generation Driver — an important step in bringing our technology to more riders in more cities. This l...

#Waymo #autonomous vehicles #computer vision #AI safety #lidar
2 months ago · ai · - · -

[Paper] SurfPhase: 3D Interfacial Dynamics in Two-Phase Flows from Sparse Videos

Interfacial dynamics in two-phase flows govern momentum, heat, and mass transfer, yet remain difficult to measure experimentally. Classical techniques face intr...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. V...

#diffusion-models #latent-reward-modeling #generative-AI #preference-learning #computer-vision
2 months ago · ai · - · -

[Paper] GENIUS: Generative Fluid Intelligence Evaluation Suite

Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess Crystallized Intelligence, w...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] PhyCritic: Multimodal Critic Models for Physical AI

With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

We present HairWeaver, a diffusion-based pipeline that animates a single human image with realistic and expressive hair dynamics. While existing methods success...

#diffusion models #few-shot synthesis #hair animation #computer vision
2 months ago · ai · - · -

[Paper] FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existin...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] First International StepUP Competition for Biometric Footstep Recognition: Methods, Results and Remaining Challenges

Biometric footstep recognition, based on a person's unique pressure patterns under their feet during walking, is an emerging field with growing applications in ...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] PuriLight: A Lightweight Shuffle and Purification Framework for Monocular Depth Estimation

We propose PuriLight, a lightweight and efficient framework for self-supervised monocular depth estimation, to address the dual challenges of computational effi...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] SAGE: Scalable Agentic 3D Scene Generation for Embodied AI

Real-world data collection for embodied agents remains costly and unsafe, calling for scalable, realistic, and simulator-ready 3D environments. However, existin...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Quantum Multiple Rotation Averaging

Multiple rotation averaging (MRA) is a fundamental optimization problem in 3D vision and robotics that aims to recover globally consistent absolute rotations fr...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] ConsID-Gen: View-Consistent and Identity-Preserving Image-to-Video Generation

Image-to-Video generation (I2V) animates a static image into a temporally coherent video sequence following textual instructions, yet preserving fine-grained ob...

#image-to-video #diffusion models #computer vision #dataset #identity preservation
2 months ago · ai · - · -

[Paper] Olaf-World: Orienting Latent Actions for Video World Modeling

Scaling action-controllable world models is limited by the scarcity of action labels. While latent action learning promises to extract control interfaces from u...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] VideoWorld 2: Learning Transferable Knowledge from Real-world Videos

Learning transferable knowledge from unlabeled video data and applying it in new environments is a fundamental capability of intelligent agents. This work prese...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail t...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Pretraining Vision-Language-Action (VLA) policies on internet-scale video is appealing, yet current latent-action objectives often learn the wrong thing: they r...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Causality in Video Diffusers is Separable from Denoising

Causality -- referring to temporal, uni-directional cause-effect relationships between components -- underlies many complex generative processes, including vide...

#video diffusion #causal modeling #machine learning #computer vision #research paper
2 months ago · ai · - · -

[Paper] 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geo...

#4d reconstruction #computer vision #transformer #video analysis #dynamic scene understanding
2 months ago · ai · - · -

[Paper] Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach

We introduce Forensim, an attention-based state-space framework for image forgery detection that jointly localizes both manipulated (target) and source regions....

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Vendi Novelty Scores for Out-of-Distribution Detection

Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confide...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

Deepfake Fraud Taking Place On an Industrial Scale, Study Finds

Industrial Scale Deepfake Fraud Deepfake fraud has gone “industrial,” according to an analysis published by AI experts. Tools to create tailored, even personal...

#deepfake #synthetic media #fraud #AI security #computer vision
2 months ago · ai · - · -

3 Questions: Using AI to help Olympic skaters land a quint

Olympic figure skating looks effortless. Athletes sail across the ice, then soar into the air, spinning like a top, before landing on a single blade just 4‑5 mm...

#AI #computer vision #sports analytics #figure skating #optical tracking #MIT #Olympics
2 months ago · ai · - · -

Image Classification with Convolutional Neural Networks – Part 1: Why We Need CNNs

Why We Need CNNs In this article, we will explore image classification using convolutional neural networks. For this, we will use a simple example: X or an O....

#convolutional neural networks #image classification #deep learning #computer vision
2 months ago · ai · - · -

[Paper] Autoregressive Image Generation with Masked Bit Modeling

This paper challenges the dominance of continuous pipelines in visual generation. We systematically investigate the performance gap between discrete and continu...

#image generation #masked bit modeling #transformer #discrete tokenizers #computer vision
2 months ago · ai · - · -

[Paper] WorldCompass: Reinforcement Learning for Long-Horizon World Models

This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enablin...

#research #paper #ai #computer-vision

Newer posts

Older posts