computer-vision — Page 2

정렬:

1주 전 · ai · - · -

[논문] 갭에 주목: 비디오 인스턴스 세그멘테이션 성능 병목 해소

In Video Instance Segmentation (VIS), classification, segmentation, and tracking objectives are jointly evaluated, but their individual contributions to perform...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 데이터 부족 상황에서 합성 병변 MR 이미지가 자동 국소 피질 이형성증 탐지에 미치는 영향

Background and Purpose: Automated detection of focal cortical dysplasia (FCD) requires large volumes of voxelwise lesion-delineated MRI data, which are difficul...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 백스캐터를 넘어: 탐지된 SAR 이미지에서의 InSAR 일관성

In this work, we propose a deep learning framework for coherence regression directly from detected SAR images, without the need for accurate coregistration. A R...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] PAR3D: 장면 이해를 위한 파트 인식 표현을 갖춘 통합 3D-MLLM

최근 3D 멀티모달 대형 언어 모델(3D-MLLMs)의 발전으로 시각 질문 응답을 포함한 3D 씬 이해 작업에 대한 통합 솔루션이 가능해졌다.

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 복잡도 균형 확산 분할

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricat...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] 상상력으로 사고하기: 에이전틱 시각 공간 추론과 World Simulators

Vision-Language Models (VLMs)이 강력한 시각적 추론 능력을 보여주었지만, 그들의 공간 추론 능력은 여전히 주로 관찰에 제한되어 있다.

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] 인컨텍스트 다중 인스턴스 학습

Multiple Instance Learning (MIL)은 인스턴스들의 bag 수준에서 감독이 제공되는 문제를 다루며, 다양한 분야에 성공적으로 적용되어 왔습니다.

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 방사선학에서 비교 추론을 위한 비전‑언어 프레임워크

Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice,...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] HomeWorld: 제어 가능하고 고밀도 상호작용이 가능한 전체 주택 장면을 생성하는 통합 평면도‑가구 프레임워크

Indoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-ba...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] EasyLens: 훈련 없이 바로 적용 가능한 의료 비전‑언어 모델용 미세 병변 표현 강화기

Medical vision-language models (VLMs) have shown increasing potential for clinical image interpretation, including lesion detection and report generation. Howev...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[Paper] Visual Commonsense 기반 지식 정제를 통한 Scene Graph Generation

학습 기반 Scene Graph Generation (SGG) 모델은 빈번한 관계 유형에서는 뛰어나지만, 주석 희소성 하에서는 급격히 성능이 저하되어 신뢰할 수 있는 ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] GMBFormer: 초고해상도 영상에서 도시 녹지를 추출하기 위한 NDVI 기반 글로벌 메모리 뱅크 트랜스포머

Urban green-space extraction from ultra-high-resolution (UHR) imagery is commonly performed patch by patch, which limits semantic reuse among spatially separate...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] Physics in 2-Steps: 시각적 정제가 이를 지우기 전에 Motion Priors 잠금

Image-to-Video diffusion models는 입력 이미지를 활용하여 시각적으로 놀라운 콘텐츠를 생성하지만, 종종 물리 법칙을 위반하는 움직임을 만들어냅니다. 우리는 …

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] UAV 다중스펙트럼 영상으로 벼 병해 매핑을 위한 딥러닝 프레임워크 비교

In this study, UAV multispectral imagery is used to segment the severity of bacterial leaf blight (BLB) in rice using convolutional neural networks (CNNs) and t...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] StoryVideoQA: 대규모, 다장르 및 자동 생성 데이터셋을 활용한 딥 비디오 이해 확장

비디오 질문 응답(VideoQA)은 주어진 비디오에 대한 질문에 답하는 것을 목표로 합니다. 기존 접근 방식은 사실형 VideoQA에서는 뛰어나지만, 깊이 있는 비디오…

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 고차원 데이터 매니폴드의 효율적인 평균 곡률 계산

Estimating local mean curvature at each point of a high-dimensional dataset is a key ingredient of geometry-aware machine learning algorithms, such as the Mean ...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[Paper] RhymeFlow: 학습 없이 비디오 생성을 가속하는 비동기 디노이징 흐름 스케줄링

Diffusion Transformers (DiTs)를 기반으로 한 비디오 생성 모델은 비디오 합성에서 놀라운 성능을 달성했지만, 높은 추론 지연 시간으로 어려움을 겪고 있습니다.

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] One-to-Many Temporal Grounding을 향하여

Temporal Grounding (TG)은 텍스트 쿼리에 해당하는 비디오 세그먼트를 위치 지정하는 것을 목표로 합니다. 기존 연구는 주로 단일 세그먼트 검색에 초점을 맞추었습니다. 실제…

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[Paper] Synthetic Data Generation 및 Vision 기반 Wrinkle 및 Keypoint Detection for Bimanual Cloth Manipulation

Robotic manipulation of textiles는 continuous deformation과 self-occlusions 때문에 estimate에 필요한 robust visual perception을 방해받아 여전히 어려운 과제이다.

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 블라인드 이미지 복원을 위한 리만 손상 매니폴드의 지오데식 흐름 매칭

Blind image restoration requires recovering clean images from observations corrupted by unknown and potentially mixed degradations. While recent deterministic f...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[Paper] RadiusFPS: CPU와 GPU에서 Spherical Voxel Pruning을 통한 효율적인 Farthest Point Sampling

포인트 클라우드는 로봇 인식의 기본 감각 표현으로, LiDAR 기반 자율 주행, 동시 위치 추정 및 지도 작성(SLAM) 등을 뒷받침합니다.

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] GRAMformer: 임의 순서 모달리티 상호작용을 위한 체적 멀티모달 교차 주의

Transformer-based multimodal models rely on attention mechanisms to integrate information across heterogeneous modalities. Despite their success, existing multi...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 기관 문서 데이터 스냅샷 추출을 위한 오픈소스 레이아웃 감지 모델 벤치마킹

Institutional documents contain substantial amounts of operational and analytical information embedded within figures and tables. Current approaches for extract...

#research #paper #ai #machine-learning #nlp #computer-vision
1주 전 · ai · - · -

[논문] 합성 벤치마크가 Forward‑Forward 스케일링을 과대평가: 실제 데이터가 드러낸 레이어‑로컬 학습 한계

Forward-Forward (FF) learning [Hinton, 2022] replaces backpropagation with strictly layer-local goodness updates. Recent FF-CNN work has narrowed the gap to BP ...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 3D 궤적과 텍스트로 제어 가능한 동적 3D 형태 생성

We introduce T2Mo, a feed-forward framework for controllable dynamic 3D shape generation conditioned on 3D trajectories and text. Due to the inherent ambiguity ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 비전 트랜스포머를 활용한 세분화 차량 분류를 위한 오픈소스 2단계 컴퓨터 비전 파이프라인

Vehicle body type is a significant determinant of cyclist injury severity in overtaking crashes, yet automated tools for classifying vehicles into injury-risk-r...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] GeM‑NR: 비강체 장면 변화를 위한 기하학 인식 다중 뷰 편집

Recent developments in multi-view image editing with generative models have brought us a step closer toward general 3D content generation and customization. Mos...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] Geometry Gaussians: 가우시안 스플래팅에서 외관과 기하를 분리

After the success of 3D Gaussian Splatting (3DGS) for novel view synthesis, many works have explored how to also use it for geometric surface representation. Ho...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 아이의 자기중심적 입력을 통한 지속적인 시각·언어 학습

Children learn the meanings of words from a continuous, temporally structured stream of egocentric experience. Recent work shows that neural networks can also l...

#research #paper #ai #machine-learning #nlp #computer-vision
1주 전 · ai · - · -

[Paper] 라벨이 필요할까? 이미 가지고 있는 메타데이터로 Vision Foundation Models 적응하기

우리는 강력하지만 일반적인 비전 파운데이션 모델을 특수 과학 분야에 적용하기 위해 라벨이 없는 접근 방식을 제안한다. 표준 감독식 파인튜닝은 …

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 로마 RAPIDly를 활용한 보석 식별

The Nancy Grace Roman Space Telescope (Roman), set for launch as early as September 2026, will conduct wide-field infrared imaging surveys with unprecedented sp...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[Paper] ZipSplat: 가우시안을 줄이고, 스플랫을 개선

Feed-forward 3D Gaussian Splatting 방법은 포즈가 지정된 이미지든 포즈가 없는 이미지든 단일 전방 패스로 장면을 재구성하지만, 현재 접근 방식은 하나의 Gauss...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] InstantRetouch: 양방향 공간을 활용한 효율적·고품질 지시 기반 이미지 보정

Language-guided photo retouching aims to adjust color and tone while preserving geometry and texture. Recently, diffusion-based retouching shows a superior visu...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] MaCo-GAN: 단일 이미지 초해상도를 위한 매니폴드 대비 적대 학습

Conventional Generative Adversarial Networks (GANs) for Single Image Super-Resolution (SISR) often struggle with hallucinated artifacts, largely because standar...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 라이다 의미 장면 완성을 위한 간단한 향상 방안 탐구

This paper investigates 'free lunch' strategies to boost the performance of lidar semantic scene completion (SSC) without requiring complex architectural redesi...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] SimuScene: 단일 이미지에서 시뮬레이션용 3D 장면을 구성·재구성

Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 뉴런 집단, 규모에 따라 선택성 차이 나타남

We investigate whether neuron populations within neural networks evolve predictably with scale, extending scaling laws beyond macroscopic observables such as lo...

#research #paper #ai #machine-learning #nlp #computer-vision
1주 전 · ai · - · -

[논문] PixVOD: 픽셀 분산 직접 시각 오도메트리 및 깊이 추정

Images composed of 2D pixel arrays are the standard input to computer vision algorithms, yet many underlying computations can be distributed across pixels. Tran...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] NewtPhys: 기초 모델이 뉴턴 물리학을 이해할까?

Previous work has evaluated physics reasoning in foundation models using synthetic or semi-synthetic scenes and visual question-answering tasks. However, these ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 블렌더 사고: 비전·언어 모델 기반 단계적 실행 역그래픽스

Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 다중모달 LLM‑판사에서 인지 판단 편향을 인지 교란과 보상 모델링으로 완화

Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical ...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] RoboDream: 확장 가능한 로봇 데이터 합성을 위한 구성 세계 모델

Scaling robot learning requires large-scale, diverse demonstrations, yet real-world data collection via teleoperation remains prohibitively expensive and time-c...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] ProtoAda: 프로토타입 기반 적응 어댑터 확장·기하학적 통합으로 다중모달 지속 인스트럭션 튜닝

Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually acquire n...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 무에서 영웅까지: 훈련 없이 세계 모델에서 맞춤 개념 생성

Autoregressive world models have emerged as a powerful paradigm for interactive video generation, allowing users to navigate dynamically generated environments ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] AdaCodec: 비디오 MLLM을 위한 예측 시각 코드

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video ML...

#research #paper #ai #machine-learning #nlp #computer-vision
1주 전 · ai · - · -

[논문] 깊이 모호성 모델링: 플라잉 포인트 없는 깊이 추정을 위한 혼합 밀도 표현

Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points ...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 하이퍼파라미터 친화적 최적화는 왜 안 될까? 긴 꼬리 인식을 위한 단조 적응형 노름 재스케일링 접근법

Long-tailed recognition poses a significant challenge for deep learning. The two-stage decoupling paradigm, which separates representation learning from classif...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] 다음에 테스트할 내용: 운전 VLM의 해석 가능한 커버리지 격차 발견

Driving vision-language models (VLMs) must accurately understand scenes across diverse conditions defined by Operational Design Domains (ODDs), yet verification...

#research #paper #ai #computer-vision

Newer posts

Older posts