computer vision — Page 12

Sort:

2 months ago · ai · - · -

Apple's AI Wearables Expected to Lean Heavily on Visual Intelligence

Overview Apple’s Visual Intelligence is expected to play a central role in the company’s upcoming AI‑focused wearables, which may include smart glasses, a pend...

#Apple #visual intelligence #AI wearables #smart glasses #computer vision
2 months ago · ai · - · -

From Metrics to Action: Turning Embedding Analysis into Sprint Tickets

!Embedding analysis overviewhttps://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s...

#embeddings #model-evaluation #computer-vision #metrics #agile-development
2 months ago · ai · - · -

[Paper] CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. T...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] SARAH: Spatially Aware Real-time Agentic Humans

As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn t...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-inva...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks

Integral Field Spectroscopy (IFS) surveys offer a unique new landscape in which to learn in both spatial and spectroscopic dimensions and could help uncover pre...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges

Despite the successes of deep learning in computer vision, difficulties persist in recognizing objects that have undergone group-symmetric transformations rarel...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. ...

#diffusion models #medical imaging #MRI synthesis #latent diffusion #computer vision
2 months ago · ai · - · -

[Paper] Self-Aware Object Detection via Degradation Manifolds

Object detectors achieve strong performance under nominal imaging conditions but can fail silently when exposed to blur, noise, compression, adverse weather, or...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Quantum-enhanced satellite image classification

We demonstrate the application of a quantum feature extraction method to enhance multi-class image classification for space applications. By harnessing the dyna...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

The Preprocessing Step You're Probably Skipping (And Why Your Model Is Paying for It)

Low‑Contrast Images and Why Models Struggle You spend days collecting data. You pick the right architecture. You tune your learning rate. You train the model,...

#data preprocessing #image augmentation #computer vision #model performance #lighting variation #training data quality
2 months ago · ai · - · -

[Paper] OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

Recent progress in multimodal reasoning has enabled agents that can interpret imagery, connect it with language, and perform structured analytical tasks. Extend...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs

Vision-Language-Action models (VLAs) promise to ground language instructions in robot control, yet in practice often fail to faithfully follow language. When pr...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Human-level 3D shape perception emerges from multi-view learning

Humans can infer the three-dimensional structure of objects from two-dimensional visual inputs. Modeling this ability has been a longstanding goal for the scien...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior st...

#research #paper #ai #machine-learning #nlp #computer-vision
2 months ago · ai · - · -

[Paper] A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

Traditional electronic recycling processes suffer from significant resource loss due to inadequate material separation and identification capabilities, limiting...

#e-waste #computer-vision #edge-ai #YOLOx #deep-learning
2 months ago · ai · - · -

[Paper] IntRec: Intent-based Retrieval with Contrastive Refinement

Retrieving user-specified objects from complex scenes remains a challenging task, especially when queries are ambiguous or involve multiple similar objects. Exi...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] CORAL: Correspondence Alignment for Improved Virtual Try-On

Existing methods for Virtual Try-On (VTON) often struggle to preserve fine garment details, especially in unpaired settings where accurate person-garment corres...

#virtual try-on #diffusion transformer #correspondence alignment #computer vision #garment synthesis
2 months ago · ai · - · -

[Paper] FR-GESTURE: An RGBD Dataset For Gesture-based Human-Robot Interaction In First Responder Operations

The ever increasing intensity and number of disasters make even more difficult the work of First Responders (FRs). Artificial intelligence and robotics solution...

#gesture recognition #RGB-D dataset #human‑robot interaction #computer vision #first responder robotics
2 months ago · ai · - · -

[Paper] RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

Recent advances in multimodal large language models (MLLMs) have shown great potential for extending vision-language reasoning to professional tool-based image ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] TeCoNeRV: Leveraging Temporal Coherence for Compressible Neural Representations for Videos

Implicit Neural Representations (INRs) have recently demonstrated impressive performance for video compression. However, since a separate INR must be overfit fo...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Saliency-Aware Multi-Route Thinking: Revisiting Vision-Language Reasoning

Vision-language models (VLMs) aim to reason by jointly leveraging visual and textual modalities. While allocating additional inference-time computation has prov...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Learning Situated Awareness in the Real World

A core aspect of human perception is situated awareness, the ability to relate ourselves to the surrounding physical environment and reason over possible action...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection

Time-series anomaly detection (TSAD) requires identifying both immediate Point Anomalies and long-range Context Anomalies. However, existing foundation models f...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] PredMapNet: Future and Historical Reasoning for Consistent Online HD Vectorized Map Construction

High-definition (HD) maps are crucial to autonomous driving, providing structured representations of road elements to support navigation and planning. However, ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge

Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches ...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Humans can infer material characteristics of objects from their visual appearance, and this ability extends to artistic depictions, where similar perceptual str...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

Generating SEM Images from Segmentation Masks

Acknowledgements We would like to thank our mentors, Asaf Nisani and Yoav Lebendiker, for their guidance throughout the project. Dataset Preparation Our projec...

#image generation #pix2pix #cyclegan #computer vision #semantic segmentation
2 months ago · ai · - · -

Structured AI (YC F25) Is Hiring

Overview Structured AI is building the AI workforce for construction design engineering. The Problem Today, billions of dollars and months of human effort are...

#AI agents #computer vision #construction design #QA/QC automation #design engineering #startup #AI co‑design
2 months ago · ai · - · -

[Paper] B-DENSE: Branching For Dense Ensemble Network Learning

Inspired by non-equilibrium thermodynamics, diffusion models have achieved state-of-the-art performance in generative modeling. However, their iterative samplin...

#diffusion models #generative AI #model distillation #computer vision #deep learning
2 months ago · ai · - · -

[Paper] VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation

Sketching is inherently a sequential process, in which strokes are drawn in a meaningful order to explore and refine ideas. However, most generative models trea...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Task-Agnostic Continual Learning for Chest Radiograph Classification

Clinical deployment of chest radiograph classifiers requires models that can be updated as new datasets become available without retraining on previously ob- se...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Context-aware Skin Cancer Epithelial Cell Classification with Scalable Graph Transformers

Whole-slide images (WSIs) from cancer patients contain rich information that can be used for medical diagnosis or to follow treatment progress. To automate thei...

#graph neural networks #computer vision #medical imaging #skin cancer detection #transformers
2 months ago · ai · - · -

[Paper] Meteorological data and Sky Images meets Neural Models for Photovoltaic Power Forecasting

Due to the rise in the use of renewable energies as an alternative to traditional ones, and especially solar energy, there is increasing interest in studying ho...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy

Endoscopy is essential in medical imaging, used for diagnosis, prognosis and treatment. Developing a robust dynamic 3D reconstruction pipeline for endoscopic vi...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Current research in multimodal models faces a key challenge where enhancing generative capabilities often comes at the expense of understanding, and vice versa....

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] RaCo: Ranking and Covariance for Practical Learned Keypoints

This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks. Th...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Language and Geometry Grounded Sparse Voxel Representations for Holistic Scene Understanding

Existing 3D open-vocabulary scene understanding methods mostly emphasize distilling language features from 2D foundation models into 3D feature fields, but larg...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations diff...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

Show HN: Wildex – Pokémon Go for real wildlife

Identify Animals, Plants, Bugs Only for iPhone – Free · Designed for iPhone Wildex – Discover Wildlife Around You Turn every walk into a treasure hunt. Wildex...

#computer-vision #mobile-app #wildlife-identification #augmented-reality #gamification
2 months ago · ai · - · -

[Paper] Image Generation with a Sphere Encoder

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion m...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Neurosim: A Fast Simulator for Neuromorphic Robot Perception

Neurosim is a fast, real-time, high-performance library for simulating sensors such as dynamic vision sensors, RGB cameras, depth sensors, and inertial sensors....

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] ThermEval: A Structured Benchmark for Evaluation of Vision-Language Models on Thermal Imagery

Vision language models (VLMs) achieve strong performance on RGB imagery, but they do not generalize to thermal images. Thermal sensing plays a critical role in ...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] PAct: Part-Decomposed Single-View Articulated Object Generation

Articulated objects are central to interactive 3D applications, including embodied AI, robotics, and VR/AR, where functional part decomposition and kinematic mo...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Maintaining spatial world consistency over long horizons remains a central challenge for camera-controllable video generation. Existing memory-based approaches ...

#video generation #spatial memory #computer vision #deep learning #transformer
2 months ago · ai · - · -

[Paper] Wrivinder: Towards Spatial Intelligence for Geo-locating Ground Images onto Satellite Imagery

Aligning ground-level imagery with geo-registered satellite maps is crucial for mapping, navigation, and situational awareness, yet remains challenging under la...

#research #paper #ai #computer-vision
2 months ago · ai · - · -

[Paper] Picking the Right Specialist: Attentive Neural Process-based Selection of Task-Specialized Models as Tools for Agentic Healthcare Systems

Task-specialized models form the backbone of agentic healthcare systems, enabling the agents to answer clinical queries across tasks such as disease diagnosis, ...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts