computer vision — Page 11

Sort:

1 month ago · ai · - · -

[Paper] MediX-R1: Open Ended Medical Reinforcement Learning

We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

We identify occlusion reasoning as a fundamental yet overlooked aspect for 3D layout-conditioned generation. It is essential for synthesizing partially occluded...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...

#dataset compression #pseudo‑labels #data distillation #computer vision #few‑shot learning
1 month ago · ai · - · -

[Paper] Sensor Generalization for Adaptive Sensing in Event-based Object Detection via Joint Distribution Training

Bio-inspired event cameras have recently attracted significant research due to their asynchronous and low-latency capabilities. These features provide a high dy...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior stems from ...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

[Paper] Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Open-vocabulary segmentation (OVS) extends the zero-shot recognition capabilities of vision-language models (VLMs) to pixel-level prediction, enabling segmentat...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

Omni-modal reasoning is essential for intelligent systems to understand and draw inferences from diverse data sources. While existing omni-modal large language ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] PRIMA: Pre-training with Risk-integrated Image-Metadata Alignment for Medical Diagnosis via LLM

Medical diagnosis requires the effective synthesis of visual manifestations and clinical metadata. However, existing methods often treat metadata as isolated ta...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] ManifoldGD: Training-Free Hierarchical Manifold Guidance for Diffusion-Based Dataset Distillation

In recent times, large datasets hinder efficient model training while also containing redundant concepts. Dataset distillation aims to synthesize compact datase...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

Bild AI (YC W25) Is Hiring Interns to Make Housing Affordable

AI/SWE Intern Now and Summer 2026 – Bild AI W25 Location: San Francisco, CA, US Compensation: $3 K – $10 K per month Type: Internship US citizen/visa only Abou...

#computer-vision #machine-learning #AI-startup #internship #construction-tech
1 month ago · ai · - · -

Bumble adds AI-powered photo feedback and profile guidance tools

Bumble announced that it’s adding a series of AI‑driven features intended to help turn matches into lasting connections, including tools that offer feedback and...

#AI-powered features #Bumble #dating app AI #profile optimization #computer vision
1 month ago · ai · - · -

[Paper] A Novel Evolutionary Method for Automated Skull-Face Overlay in Computer-Aided Craniofacial Superimposition

Craniofacial Superimposition is a forensic technique for identifying skeletal remains by comparing a post-mortem skull with ante-mortem facial photographs. A cr...

#computer-vision #evolutionary-algorithm #forensic-technology #3d-reconstruction #differential-evolution
1 month ago · ai · - · -

[Paper] Neu-PiG: Neural Preconditioned Grids for Fast Dynamic Surface Reconstruction on Long Sequences

Temporally consistent surface reconstruction of dynamic 3D objects from unstructured point cloud data remains challenging, especially for very long sequences. E...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] WHOLE: World-Grounded Hand-Object Lifted from Egocentric Videos

Egocentric manipulation videos are highly challenging due to severe occlusions during interactions and frequent object entries and exits from the camera view as...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Solaris: Building a Multiplayer Video World Model in Minecraft

Existing action-conditioned video generation models (video world models) are limited to single-agent perspectives, failing to capture the multi-agent interactio...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

Advances in Generative AI (GenAI) have led to the development of various protection strategies to prevent the unauthorized use of images. These methods rely on ...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Mixed Magnification Aggregation for Generalizable Region-Level Representations in Computational Pathology

In recent years, a standard computational pathology workflow has emerged where whole slide images are cropped into tiles, these tiles are processed using a foun...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness

Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and...

#super-resolution #cyclic upscaling #distribution alignment #self-similarity #computer vision
1 month ago · ai · - · -

[Paper] CoLoGen: Progressive Learning of Concept`-`Localization Duality for Unified Image Generation

Unified conditional image generation remains difficult because different tasks depend on fundamentally different internal representations. Some require conceptu...

#diffusion models #image generation #concept-localization duality #computer vision #machine learning
1 month ago · ai · - · -

[Paper] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

Object hallucination is a critical issue in Large Vision-Language Models (LVLMs), where outputs include objects that do not appear in the input image. A natural...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai · - · -

[Paper] MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Medical vision-language pretraining increasingly relies on medical reports as large-scale supervisory signals; however, raw reports often exhibit substantial st...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] WeaveTime: Stream from Earlier Frames into Emergent Memory in VideoLLMs

Recent advances in Multimodal Large Language Models have greatly improved visual understanding and reasoning, yet their quadratic attention and offline training...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

Visual imitation learning: Guidde trains AI agents on human 'expert video' instead of documentation

'Cleaned Markdown Version

#visual imitation learning #agentic AI #screen recording training #enterprise automation #computer vision #AI agents #imitation learning
1 month ago · ai · - · -

[Paper] Test-Time Training with KV Binding Is Secretly Linear Attention

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics

Visual reinforcement learning is appealing for robotics but expensive -- off-policy methods are sample-efficient yet slow; on-policy methods parallelize well bu...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Multi-Vector Index Compression in Any Modality

We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in...

#research #paper #ai #nlp #computer-vision
1 month ago · ai · - · -

[Paper] Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent ...

#research #paper #ai #machine-learning #nlp #computer-vision
1 month ago · ai · - · -

[Paper] Region of Interest Segmentation and Morphological Analysis for Membranes in Cryo-Electron Tomography

Cryo-electron tomography (cryo-ET) enables high resolution, three-dimensional reconstruction of biological structures, including membranes and membrane proteins...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Human Video Generation from a Single Image with 3D Pose and View Control

Recent diffusion methods have made significant progress in generating videos from single images due to their powerful visual generation capabilities. However, c...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning

While Vision-Language Models (VLMs) exhibit exceptional 2D visual understanding, their ability to comprehend and reason about 3D space--a cornerstone of spatial...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

Graph-based medical image segmentation represents anatomical structures using boundary graphs, providing fixed-topology landmarks and inherent population-level ...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence

Deep learning has significantly advanced automated brain tumor diagnosis, yet clinical adoption remains limited by interpretability and computational constraint...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Text-to-image retrieval is a fundamental task in vision-language learning, yet in real-world scenarios it is often challenged by short and underspecified user q...

#research #paper #ai #computer-vision
1 month ago · ai · - · -

[Paper] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

[Paper] MIP Candy: A Modular PyTorch Framework for Medical Image Processing

Medical image processing demands specialized software that handles high-dimensional volumetric data, heterogeneous file formats, and domain-specific training pr...

#research #paper #ai #machine-learning #computer-vision
1 month ago · ai · - · -

소상공인 매장 업무 효율화 AI 솔루션 ‘모코플렉스’, AI 헤어 스타일링 ‘스타일싱크’ 출시

!소상공인 매장 업무 효율화 AI 솔루션 ‘모코플렉스’, AI 헤어 스타일링 ‘스타일싱크’ 출시https://besuccess.com/wp-content/uploads/2026/02/%EC%9D%B4%EB%AF%B8%EC%A7%80-%EB%AA%A8%EC%BD%94%ED%94%8C%EB...

#AI solution #small business #retail efficiency #hair styling #computer vision #visualization #MokoFlex #StyleSync
2 months ago · ai · - · -

[Paper] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too hea...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstructio...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] A Very Big Video Reasoning Suite

Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence i...

#video reasoning #large-scale dataset #computer vision #benchmark #AI research
2 months ago · ai · - · -

[Paper] Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

Current feed-forward 3D/4D reconstruction systems rely on dense geometry and pose supervision -- expensive to obtain at scale and particularly scarce for dynami...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization

Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods stru...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Do Large Language Models Understand Data Visualization Rules?

Data visualization rules-derived from decades of research in design and perception-ensure trustworthy chart communication. While prior work has shown that large...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning

Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) a...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Benchmarking Unlearning for Vision Transformers

Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, r...

#machine-unlearning #vision-transformers #benchmark #computer-vision #research
2 months ago · ai · - · -

[Paper] Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine

The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supe...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

Edge-based representations are fundamental cues for visual understanding, a principle rooted in early vision research and still central today. We extend this pr...

#research #paper #ai #machine-learning #computer-vision
2 months ago · devops · - · -

[Paper] Linear Reservoir: A Diagonalization-Based Optimization

We introduce a diagonalization-based optimization for Linear Echo State Networks (ESNs) that reduces the per-step computational complexity of reservoir state up...

#research #paper #devops #computer-vision

Newer posts

Older posts