computer-vision — Page 19

Sort:

3 months ago · ai · - · -

[Paper] AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction

Reconstructing dynamic 3D scenes from monocular videos requires simultaneously capturing high-frequency appearance details and temporally continuous motion. Exi...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

Left ventricle (LV) segmentation is critical for clinical quantification and diagnosis of cardiac images. In this work, we propose two novel deep learning archi...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Fusion-SSAT: Unleashing the Potential of Self-supervised Auxiliary Task by Feature Fusion for Generalized Deepfake Detection

In this work, we attempted to unleash the potential of self-supervised learning as an auxiliary task that can optimise the primary task of generalised deepfake ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

Federated data sharing promises utility without centralizing raw data, yet existing embedding-level generators struggle under non-IID client heterogeneity and p...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection

While Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs) have shown strong generalisation in detecting image and video deepfakes, their ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Unified Primitive Proxies for Structured Shape Completion

Structured shape completion recovers missing geometry as primitives rather than as unstructured points, which enables primitive-based surface reconstruction. In...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Grading Handwritten Engineering Exams with Multimodal Large Language Models

Handwritten STEM exams capture open-ended reasoning and diagrams, but manual grading is slow and difficult to scale. We present an end-to-end workflow for gradi...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Multi-Level Feature Fusion for Continual Learning in Visual Quality Inspection

Deep neural networks show great potential for automating various visual quality inspection tasks in manufacturing. However, their applicability is limited in mo...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

Vision-Language Models have demonstrated strong potential in medical image analysis and disease diagnosis. However, after deployment, their performance may dete...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Efficient Deep Demosaicing with Spatially Downsampled Isotropic Networks

In digital imaging, image demosaicing is a crucial first step which recovers the RGB information from a color filter array (CFA). Oftentimes, deep learning is u...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

PyTorch vs. TensorFlow: Choosing Your AI Framework for 2026

What is TensorFlow? Developed by Google Brain, TensorFlow is a robust and versatile framework known for its extensive collection of tools, libraries, and resou...

#PyTorch #TensorFlow #machine learning #deep learning #AI frameworks #computer vision #natural language processing #model training #scalability
3 months ago · ai · - · -

Computer Vision Services: Building Intelligent Visual Systems with Oodles

Images and videos contain massive amounts of data—but extracting meaningful insights from them requires advanced AI systems. Computer Vision Serviceshttps://www...

#computer vision #deep learning #AI #image analytics #object detection #OCR #neural networks #visual data
3 months ago · ai · - · -

[Paper] SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePil...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input view...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Edit3r: Instant 3D Scene Editing from Sparse Unposed Images

We present Edit3r, a feed-forward framework that reconstructs and edits 3D scenes in a single pass from unposed, view-inconsistent, instruction-edited images. U...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] FineTec: Fine-Grained Action Recognition Under Temporal Corruption via Skeleton Decomposition and Sequence Completion

Recognizing fine-grained actions from temporally corrupted skeleton sequences remains a significant challenge, particularly in real-world scenarios where online...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing

Audio-driven visual dubbing aims to synchronize a video's lip movements with new speech, but is fundamentally challenged by the lack of ideal training data: pai...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Generative Classifiers Avoid Shortcut Solutions

Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stem...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM

We present FoundationSLAM, a learning-based monocular dense SLAM system that addresses the absence of geometric consistency in previous flow-based approaches fo...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Bi-C2R: Bidirectional Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification

Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performan...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes

Realistic visual simulations are omnipresent, yet their creation requires computing time, rendering, and expert animation knowledge. Open-vocabulary visual effe...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments

Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under i...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged Refinement

Accurate and interpretable crop disease diagnosis is essential for agricultural decision-making, yet existing methods often rely on costly supervised fine-tunin...

#research #paper #ai #nlp #computer-vision
3 months ago · ai · - · -

[Paper] Projection-based Adversarial Attack using Physics-in-the-Loop Optimization for Monocular Depth Estimation

Deep neural networks (DNNs) remain vulnerable to adversarial attacks that cause misclassification when specific perturbations are added to input images. This vu...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

Deep Maze Solver

Introduction A few days ago I saw a X Posthttps://twitter.com/ArnaudPannatier/status/1762864347397628396 explaining that diffusion models could be used to solv...

#convolutional neural network #PyTorch #maze solving #supervised learning #diffusion models #computer vision
3 months ago · ai · - · -

[Paper] RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics

Live video analytics (LVA) runs continuously across massive camera fleets, but inference cost with modern vision models remains high. To address this, dynamic m...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

Remove CapCut Watermark with AI — How We Built a Flicker-Free Video Inpainting System

!Cover image for Remove CapCut Watermark with AI — How We Built a Flicker‑Free Video Inpainting Systemhttps://media2.dev.to/dynamic/image/width=1000,height=420,...

#CapCut #watermark removal #video inpainting #AI restoration #deep learning #computer vision #flicker‑free video #video editing
3 months ago · ai · - · -

AI-Powered Heat Maps for Industrial Worksites

!Cover image for AI-Powered Heat Maps for Industrial Worksiteshttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https...

#computer vision #heatmaps #industrial safety #video analytics #workforce monitoring #CCTV #AI-powered analytics
3 months ago · ai · - · -

[Paper] Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Transparent objects remain notoriously hard for perception systems: refraction, reflection and transmission break the assumptions behind stereo, ToF and purely ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Web World Models

Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web fra...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Rec...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion

Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion sy...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Omnimodal large language models have made significant strides in unifying audio and visual modalities; however, they often lack the fine-grained cross-modal und...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception

Spatio-temporal alignment is crucial for temporal modeling of end-to-end (E2E) perception in autonomous driving (AD), providing valuable structural and textural...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Memorization in 3D Shape Generation: An Empirical Study

Generative models are increasingly used in 3D vision to synthesize novel shapes, yet it remains unclear whether their generation relies on memorizing training s...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] Scalable Residual Feature Aggregation Framework with Hybrid Metaheuristic Optimization for Robust Early Pancreatic Neoplasm Detection in Multimodal CT Imaging

The early detection of pancreatic neoplasm is a major clinical dilemma, and it is predominantly so because tumors are likely to occur with minimal contrast marg...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Detection Fire in Camera RGB-NIR

Improving the accuracy of fire detection using infrared night vision cameras remains a challenging task. Previous studies have reported strong performance with ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature

The integration of Multimodal Large Language Models (MLLMs) into chemistry promises to revolutionize scientific discovery, yet their ability to comprehend the d...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

[Paper] CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

Large Language Model (LLM) agents, while proficient in the digital realm, face a significant gap in physical-world deployment due to the challenge of forming an...

#research #paper #ai #machine-learning #nlp #computer-vision
3 months ago · ai · - · -

[Paper] MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

Multimodal Large Language Models (LLMs) introduce an emerging paradigm for medical imaging by interpreting scans through the lens of extensive clinical knowledg...

#research #paper #ai #machine-learning #computer-vision
3 months ago · ai · - · -

Detecting Adversarial Samples from Artifacts

Overview Many AI systems can be fooled by tiny, almost invisible edits to images that cause them to give incorrect answers. Researchers have discovered a simpl...

#adversarial attacks #uncertainty estimation #model robustness #computer vision #AI safety
3 months ago · ai · - · -

Apple releases open-source model that instantly turns 2D photos into 3D views

Article URL: https://github.com/apple/ml-sharp Comments URL: https://news.ycombinator.com/item?id=46401539 Points: 71 Comments: 23...

#apple #open-source #3d-reconstruction #computer-vision #machine-learning
3 months ago · ai · - · -

[Paper] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Large vision-language models (VLMs) often benefit from intermediate visual cues, either injected via external tools or generated as latent visual tokens during ...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] ProEdit: Inversion-based Editing From Prompts Done Right

Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically in...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Learning Association via Track-Detection Matching for Multi-Object Tracking

Multi-object tracking aims to maintain object identities over time by associating detections across video frames. Two dominant paradigms exist in literature: tr...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] Yume-1.5: A Text-Controlled Interactive World Generation Model

Recent approaches have demonstrated the promise of using diffusion models to generate interactive and explorable worlds. However, most of these methods face cri...

#research #paper #ai #computer-vision
3 months ago · ai · - · -

[Paper] StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars

Real-time, streaming interactive avatars represent a critical yet challenging goal in digital human research. Although diffusion-based human avatar generation m...

#research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts