computer-vision — Page 22

Sort:

4 months ago · software · - · -

Renderizando la cámara con Metal en iOS (AVFoundation + MetalKit)

Renderizado de vídeo de cámara con Metal sin AVCaptureVideoPreviewLayer En este tutorial vamos a renderizar el video de la cámara directamente en pantalla usan...

#iOS #Metal #AVFoundation #MetalKit #camera #video rendering #Swift #shaders #AR #computer vision #machine learning
4 months ago · ai · - · -

[Paper] MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

The core challenge for streaming video generation is maintaining the content consistency in long context, which poses high requirement for the memory design. Mo...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

This paper does not introduce a novel method but instead establishes a straightforward, incremental, yet essential baseline for video temporal grounding (VTG), ...

#research #paper #ai #machine-learning #nlp #computer-vision
4 months ago · ai · - · -

[Paper] Spherical Leech Quantization for Visual Tokenization and Generation

Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a uni...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. Prior work on joint human-scene reconstruction reli...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Native and Compact Structured Latents for 3D Generation

Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, w...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] MMGR: Multi-Modal Generative Reasoning

Video foundation models generate visually realistic and temporally coherent content, but their reliability as world simulators depends on whether they capture p...

#research #paper #ai #nlp #computer-vision
4 months ago · ai · - · -

[Paper] VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image

We propose VASA-3D, an audio-driven, single-shot 3D head avatar generator. This research tackles two major challenges: capturing the subtle expression details p...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] ART: Articulated Reconstruction Transformer

We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only ...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through env...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Enhancing Visual Sentiment Analysis via Semiotic Isotopy-Guided Dataset Construction

Visual Sentiment Analysis (VSA) is a challenging task due to the vast diversity of emotionally salient images and the inherent difficulty of acquiring sufficien...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide Images

Timely and accurate lymphoma diagnosis is essential for guiding cancer treatment. Standard diagnostic practice combines hematoxylin and eosin (HE)-stained whole...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction

This paper introduces JMMMU-Pro, an image-based Japanese Multi-discipline Multimodal Understanding Benchmark, and Vibe Benchmark Construction, a scalable constr...

#research #paper #ai #machine-learning #nlp #computer-vision
4 months ago · software · - · -

alpr.watch

Article URL: https://alpr.watch/ Comments URL: https://news.ycombinator.com/item?id=46290916 Points: 224 Comments: 114...

#license-plate-recognition #computer-vision #open-source #ALPR #surveillance-tool
4 months ago · ai · - · -

Ai2’s Molmo 2 shows open-source models can rival proprietary giants in video understanding

Fresh off releasing the latest version of its Olmo foundation model, the Allen Institute for AI Ai2 launched its open-source video model, Molmo 2, on Tuesday, a...

#Molmo 2 #video understanding #open-source AI #Allen Institute for AI #foundation models #computer vision
4 months ago · ai · - · -

AlphaFlow: Understanding and Improving MeanFlow Models

AlphaFlow provides a smoother training schedule for MeanFlow image models, reducing the conflict between its two objectives and accelerating learning. Overview...

#MeanFlow #AlphaFlow #image generation #training optimization #deep learning #computer vision
4 months ago · ai · - · -

[Paper] DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders

Video diffusion models have revolutionized generative video synthesis, but they are imprecise, slow, and can be opaque during generation -- keeping users in the...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] LitePT: Lighter Yet Stronger Point Transformer

Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains uncl...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Towards Scalable Pre-training of Visual Tokenizers for Generation

The quality of the latent space in visual tokenizers (e.g., VAEs) is crucial for modern generative models. However, the standard reconstruction-based training p...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Recurrent Video Masked Autoencoders

We present Recurrent Video Masked-Autoencoders (RVM): a novel video representation learning approach that uses a transformer-based recurrent neural network to a...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners

Generalization remains the central challenge for interactive 3D scene generation. Existing learning-based approaches ground spatial understanding in limited sce...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction

Recent feed-forward reconstruction models like VGGT and π^3 achieve impressive reconstruction quality but cannot process streaming videos due to quadratic memor...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Feedforward 3D Editing via Text-Steerable Image-to-3D

Recent progress in image-to-3D has opened up immense possibilities for design, AR/VR, and robotics. However, to use AI-generated 3D assets in real applications,...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] JoVA: Unified Multimodal Learning for Joint Video-Audio Generation

In this paper, we present JoVA, a unified framework for joint video-audio generation. Despite recent encouraging advances, existing methods face two critical li...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Towards Interactive Intelligence for Digital Humans

We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolu...

#research #paper #ai #nlp #computer-vision
4 months ago · ai · - · -

[Paper] Directional Textual Inversion for Personalized Text-to-Image Generation

Textual Inversion (TI) is an efficient approach to text-to-image personalization but often fails on complex prompts. We trace these failures to embedding norm i...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] World Models Can Leverage Human Videos for Dexterous Manipulation

Dexterous manipulation is challenging because it requires understanding how subtle hand motion influences the environment through contact with objects. We intro...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] From Code to Field: Evaluating the Robustness of Convolutional Neural Networks for Disease Diagnosis in Mango Leaves

The validation and verification of artificial intelligence (AI) models through robustness assessment are essential to guarantee the reliable performance of inte...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Do-Undo: Generating and Reversing Physical Actions in Vision-Language Models

We introduce the Do-Undo task and benchmark to address a critical gap in vision-language models: understanding and generating physically plausible scene transfo...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] DA-SSL: self-supervised domain adaptor to leverage foundational models in turbt histopathology slides

Recent deep learning frameworks in histopathology, particularly multiple instance learning (MIL) combined with pathology foundational models (PFMs), have shown ...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

AI image generators are getting better by getting worse

Real ones will know that Mount Rainier looks too big in this image, but the re-creation of a Washington State ferry in this AI image is uncanny. This is The Ste...

#AI image generation #diffusion models #generative AI #computer vision #deep learning #stable diffusion #AI art
4 months ago · ai · - · -

The Evolution of AI Surveillance

AI Surveillance on British Roads On a grey morning along the A38 near Plymouth, a white van equipped with twin cameras captures thousands of images per hour, i...

#AI surveillance #computer vision #privacy #road safety #emotion recognition
4 months ago · ai · - · -

AI Background Remover: Image Quality and Edge Accuracy

Introduction An AI background remover can feel almost magical when it works well—and frustrating when it doesn’t. The difference usually comes down to two thin...

#background removal #image quality #edge accuracy #computer vision #AI models #image segmentation #deep learning
4 months ago · ai · - · -

[Paper] Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance

The recent success of 3D Gaussian Splatting (3DGS) has reshaped novel view synthesis by enabling fast optimization and real-time rendering of high-quality radia...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. Howeve...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Particulate: Feed-Forward 3D Object Articulation

We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying arti...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis

The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and s...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Reality is a dance between rigid constraints and deformable structures. For video models, that means generating motion that preserves fidelity as well as struct...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Uncertainty-Aware Domain Adaptation for Vitiligo Segmentation in Clinical Photographs

Accurately quantifying vitiligo extent in routine clinical photographs is crucial for longitudinal monitoring of treatment response. We propose a trustworthy, f...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator

Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effe...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints

Model fingerprint detection techniques have emerged as a promising approach for attributing AI-generated images to their source models, but their robustness und...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] Reducing Domain Gap with Diffusion-Based Domain Adaptation for Cell Counting

Generating realistic synthetic microscopy images is critical for training deep learning models in label-scarce environments, such as cell counting with many cel...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perc...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

Reliable interpretation of multimodal data in dentistry is essential for automated oral healthcare, yet current multimodal large language models (MLLMs) struggl...

#research #paper #ai #machine-learning #nlp #computer-vision
4 months ago · ai · - · -

[Paper] HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning

Key frame selection in video understanding presents significant challenges. Traditional top-K selection methods, which score frames independently, often fail to...

#research #paper #ai #nlp #computer-vision
4 months ago · ai · - · -

[Paper] Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems

The growing demand for real-time DNN applications on edge devices necessitates faster inference of increasingly complex models. Although many devices include sp...

#research #paper #ai #machine-learning #computer-vision
4 months ago · ai · - · -

[Paper] StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without expl...

#research #paper #ai #computer-vision
4 months ago · ai · - · -

[Paper] WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physicall...

#research #paper #ai #computer-vision

Newer posts

Older posts