computer-vision — Page 40

1 month ago · ai

[Paper] Object-Centric Data Synthesis for Category-level Object Detection

Deep learning approaches to object detection have achieved reliable detection of specific object classes in images. However, extending a model's detection capab...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval

Inverse heat problems refer to the estimation of material thermophysical properties given observed or known heat diffusion behaviour. Inverse heat problems have...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model

Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis towar...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] DisMo: Disentangled Motion Representations for Open-World Motion Transfer

Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual ...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] MANTA: Physics-Informed Generalized Underwater Object Tracking

Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water cond...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

Unifying multimodal understanding, generation and reconstruction representation in a single tokenizer remains a key challenge in building unified models. Previo...

#research #paper #ai #computer-vision
1 month ago · ai

[Paper] Optimizing Multimodal Language Models through Attention-based Interpretability

Modern large language models become multimodal, analyzing various data formats like text and images. While fine-tuning is effective for adapting these multimoda...

#research #paper #ai #nlp #computer-vision
1 month ago · ai

[Paper] Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach

Large-scale Vision Language Models (LVLMs) exhibit advanced capabilities in tasks that require visual information, including object detection. These capabilitie...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai

[Paper] Canvas-to-Image: Compositional Image Generation with Multimodal Controls

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, ...

#image generation #diffusion models #multimodal control #computer vision #research
1 month ago · ai

[Paper] TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - human...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai

[Paper] G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

Vision-Language Models (VLMs) still lack robustness in spatial intelligence, demonstrating poor performance on spatial understanding and reasoning tasks. We att...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai

[Paper] Seeing without Pixels: Perception from Camera Trajectories

Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to syste...

#research #paper #ai #computer-vision

Newer posts

Older posts