computer-vision — Page 30

Sort:

3 months ago · ai · - · -

[Paper] A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge

Nowadays, visual intelligence tools have become ubiquitous, offering all kinds of convenience and possibilities. However, these tools have high computational re...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Astra: General Interactive World Model with Autoregressive Denoising

Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world model...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment

Novel View Synthesis (NVS) has traditionally relied on models with explicit 3D inductive biases combined with known camera parameters from Structure-from-Motion...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper intr...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Unified Diffusion Transformer for High-fidelity Text-Aware Image Restoration

Text-Aware Image Restoration (TAIR) aims to recover high- quality images from low-quality inputs containing degraded textual content. While diffusion models pro...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] LiDAS: Lighting-driven Dynamic Active Sensing for Nighttime Perception

Nighttime environments pose significant challenges for camera-based perception, as existing methods passively rely on the scene lighting. We introduce Lighting-...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Self-Evolving 3D Scene Generation from a Single Image

Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover re...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation

Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

Visual reasoning is challenging, requiring both precise object grounding and understanding complex spatial relationships. Existing methods fall into two camps: ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Accelerated Rotation-Invariant Convolution for UAV Image Segmentation

Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-sc...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing

Vision-language models (VLMs) are emerging as powerful generalist tools for remote sensing, capable of integrating information across diverse tasks and enabling...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-comput...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] MatteViT: High-Frequency-Aware Document Shadow Removal with Shadow Matte Guidance

Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critica...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge Devices

In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture

Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-...

#research #paper #ai #nlp #computer-vision
3 months ago · ai · - · -

[Paper] Conditional Morphogenesis: Emergent Generation of Structural Digits via Neural Cellular Automata

Biological systems exhibit remarkable morphogenetic plasticity, where a single genome can encode various specialized cellular structures triggered by local chem...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Voxify3D: Pixel Art Meets Volumetric Rendering

Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting re...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Relational Visual Similarity

Humans do not just see attribute similarity -- we also see relational similarity. An apple is like a peach because both are reddish fruit, but the Earth is also...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Recent video generation models demonstrate impressive synthesis capabilities but remain limited by single-modality conditioning, constraining their holistic wor...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively ...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes

Embedding a language field in a 3D representation enables richer semantic understanding of spatial environments by linking geometry with descriptive meaning. Th...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Multi-view Pyramid Transformer: Look Coarser to See Broader

We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds...

#research #paper #ai #computer-vision

Newer posts

Older posts