paper — Page 105

1 month ago · ai

[Paper] Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision

The success of foundation models in language and vision motivated research in fully end-to-end robot navigation foundation models (NFMs). NFMs directly map mono...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation pri...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Bidirectional Normalizing Flow: From Data to Noise and Back

Normalizing Flows (NFs) have been established as a principled framework for generative modeling. Standard NFs consist of a forward process and a reverse process...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask ...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Hierarchical Dataset Selection for High-Quality Data Sharing

The success of modern machine learning hinges on access to high-quality training data. In many real-world scenarios, such as acquiring data from public reposito...

#research #paper #ai #machine-learning
1 month ago · ai

[Paper] E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

Self-supervised pre-training has revolutionized foundation models for languages, individual 2D images and videos, but remains largely unexplored for learning 3D...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Reinforcement learning (RL), earlier proven to be effective in large language and multi-modal models, has been successfully extended to enhance 2D image generat...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai

[Paper] ClusIR: Towards Cluster-Guided All-in-One Image Restoration

All-in-One Image Restoration (AiOIR) aims to recover high-quality images from diverse degradations within a unified framework. However, existing methods often f...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, whi...

#research #paper #ai #machine-learning
1 month ago · ai

[Paper] AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation

Recent advances in subject-driven video generation with large diffusion models have enabled personalized content synthesis conditioned on user-provided subjects...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Mull-Tokens: Modality-Agnostic Latent Thinking

Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much more that words alone cannot convey. Existing multimo...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts