computer-vision

정렬:

4일 전 · ai · - · -

[Paper] 한 시간짜리 영상에서 자연어 시간 정합은 검색 문제: 벤치마크와 실증적 분해

Temporal grounding--returning the interval [t_s, t_e] for a natural-language query over a video--is the language interface to long-form video, yet has been stud...

#research #paper #ai #machine-learning #computer-vision
4일 전 · ai · - · -

[논문] 포렌식 이미지 검색의 모달리티 격차 해소

Automated image retrieval plays an increasingly critical role in modern forensic analysis, supporting investigative workflows that rely on efficient comparison ...

#research #paper #ai #computer-vision
4일 전 · ai · - · -

[논문] CellNet – 희소하고 잡음이 섞인 포인트 주석으로 세포 위치 파악

Counting living cells is an important step in many biological research workflows. Our collaborators at the Wellcome Sanger Institute study vital genes in humans...

#research #paper #ai #computer-vision
4일 전 · ai · - · -

[논문] 점진적 크기 기반 프루닝으로 한 번의 학습 사이클에서 희소 서브네트워크 찾기

Neural network pruning reduces model size by removing less important parameters while aiming to preserve predictive performance. Although the Lottery Ticket Hyp...

#research #paper #ai #machine-learning #computer-vision
4일 전 · ai · - · -

[논문] VOID: 잠재 확산 모델의 무단 모방 방지

While Latent Diffusion Models (LDMs) have revolutionized visual synthesis, they are increasingly exploited for unauthorized mimicry of individuals. Existing def...

#research #paper #ai #computer-vision
4일 전 · ai · - · -

[논문] 낮과 밤을 잇다: 시너지 프롬프트와 프로토타입 학습을 활용한 비지도 교차 도메인 재식별

Cross-domain day-night re-identification (ReID) is fundamentally challenged by the substantial visual appearance discrepancies between daytime and nighttime sce...

#research #paper #ai #computer-vision
4일 전 · ai · - · -

[논문] 다중 GPU 가우시안 스플래팅을 위한 확장 가능한 PyTorch 추상화

Gaussian splatting methods have become increasingly popular for neural reconstruction of the real world. However, they are often limited in scale and resolution...

#research #paper #ai #machine-learning #computer-vision
5일 전 · ai · - · -

[논문] FADA: 선택적으로 증류된 통합 비전‑언어 모델을 통한 접근 가능한 태아 초음파 해석 및 주석

A global shortage of trained sonographers limits prenatal ultrasound screening in low- and middle-income countries, where over half of pregnant women receive no...

#research #paper #ai #machine-learning #computer-vision
5일 전 · ai · - · -

[논문] IDEAL: 깊이 정렬이 이산 표현 오토인코더를 만든다

Built on pretrained vision foundation models (VFMs), representation autoencoders (RAEs) have recently emerged as a promising approach for constructing semantica...

#research #paper #ai #computer-vision
5일 전 · ai · - · -

[논문] U‑TTT: 테스트‑타임 트레이닝으로 일반화 가능한 PET 이미지 노이즈 제거

Existing deep learning models for Positron Emission Tomography (PET) image denoising often suffer from severe performance degradation under distribution shifts,...

#research #paper #ai #computer-vision
5일 전 · ai · - · -

[논문] 적응 방사선 치료에서 선량 누적을 위한 불확실성 추정 프레임워크: 자궁경부암 CBCT‑유도 방사선 치료 적용

Background and purpose: oART enables daily plan adaptation to interfraction anatomical variations, but cumulative dose estimation remains limited by DIR, segmen...

#research #paper #ai #computer-vision
5일 전 · ai · - · -

[논문] IPSM‑Bench: 아연 기반 흡수성 바이오재료 미세구조 이미지의 새로운 중간상 분할 벤치마크

Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics....

#research #paper #ai #computer-vision
5일 전 · ai · - · -

[논문] AnimaSpark: 임의 3D 객체 애니메이션을 위한 피드포워드 방식

While recent advancements in generative AI have substantially accelerated static 3D model creation workflows, the synthesis of category-agnostic 3D animations r...

#research #paper #ai #computer-vision
5일 전 · ai · - · -

[논문] 비주얼 인컨텍스트 학습은 어디로? 도메인·작업을 아우르는 통합 벤치마크

Visual in-context learning has been proposed as a pathway towards dynamic models that can generate predictions based on a provided context and thereby can adapt...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] 비디오 월드 모델을 위한 잠재 공간 메모리

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This des...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] MemoryVLA++: 비전‑언어‑액션 모델에서 기억과 상상을 활용한 시간 모델링

Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. Howeve...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] OmniGameArena: 개선 역학을 갖춘 VLM 게임 에이전트를 위한 통합 UE5 벤치마크

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single firs...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] PTL‑Diffusion: 주기적 종단 법칙을 적용한 매니폴드 인식 확산

Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analy...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] iMaC: 행동을 동작·접촉 이미지로 변환해 구현형 세계 모델 구축

Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodi...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] AHA‑WAM: 관찰 기반 컨텍스트 라우팅을 활용한 비동기 수평 적응형 세계‑행동 모델링

World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors in...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] Echo-Memory: 행동 세계 모델에서 기억에 대한 통제된 연구

We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first fram...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] 구면조화 함수를 넘어서: 복사 재구성을 위한 외관 모델 재고

View-dependent appearance modeling remains a challenging problem in novel-view synthesis and reconstruction. Accurately representing complex angular effects oft...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] 감지기 제한 판독 하에 분류를 위한 비상관 영상의 종단 간 최적화

End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism char...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] POTATR: 페이지 수준 표 추출을 위한 경량 이미지‑그래프 모델

Large-scale document processing requires contextually aware table extraction (TE) that is both accurate and efficient. Yet current approaches require billions o...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] SemDINO: 변화 감지에서 시계열 간 의미 정렬을 위한 DINOv3 기반 네트워크

Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] 시공간 신경망을 위한 하이브리드 강건성 검증

With AI increasingly deployed in safety-critical systems, providing formal robustness guarantees for the underlying models is essential. Existing verification m...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] HDSL: 구조화된 3D 실내 장면 생성 및 LLM 에이전트를 활용한 국부 편집을 위한 계층형 도메인 전용 언어

Text-driven indoor scene generation and editing require an intermediate representation that language models can both produce and revise. Existing LLM-based syst...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] 자기지도 원칙을 통한 확산 모델 표현 공간 평가

Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connect...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] Cranio-Diff: 2D X‑ray 두개골 가이드와 구조 정체성 제약을 이용한 확산 기반 교차 도메인 두개안면 복원

The state-of-the-art generative models, such as CycleGAN, Pix2Pix, and diffusion models have demonstrated remarkable performance in the face generation task. Ho...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] GenEyePose: 환자 없이 지식 기반 급속 안구 움직임 모델링으로 디지털 신경생리 바이오마커 개발

Eye movements, including saccades, are widely regarded as highly sensitive and objective biomarkers of neurophysiologic states. Detecting saccadic signatures in...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] SoccerNet 2026 플레이어‑중심 볼 액션 스포팅: FOOTPASS 베이스라인 재학습 및 후처리 확장

We describe our system for the SoccerNet 2026 Player-Centric Ball-Action Spotting Challenge, which requires predicting who performs which action and when, acros...

#research #paper #ai #computer-vision
6일 전 · ai · - · -

[논문] 시각 프롬프트와 이중 교사 감독을 활용한 특징 재구성 기반 이상 탐지

Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods f...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] 비디오 기반 모델은 직관 물리학을 이해할 수 있을까? 층별 탐색 분석

We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across...

#research #paper #ai #machine-learning #computer-vision
6일 전 · ai · - · -

[논문] 답은 어디서 오는가? 자율주행 다중뷰 MLLM 시점별 시각 증거 식별 벤치마크

Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model reli...

#research #paper #ai #nlp #computer-vision
1주 전 · ai · - · -

[논문] 비디오 오독: 탐색적 조작 추적 QA를 위한 읽기 휴리스틱 폐쇄‑루프 증류

Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, f...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] UniSHARP: 범용 선명 단안 시점 합성

In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] MemDreamer: 계층 그래프 메모리와 에이전트형 검색으로 긴 비디오 이해의 지각·추론 분리

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...

#research #paper #ai #machine-learning #nlp #computer-vision
1주 전 · ai · - · -

[논문] 스트리밍 힘 제어를 활용한 비디오 생성

We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video mo...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 탐지 차이: 중요한 상황에서의 설명 가능성

We propose Differences in Detection (DnD), an intuitive method to compare two object detection models. Based on the same matching algorithm, it complements the ...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 대조적 비지도 데이터 증강을 위한 암시적 데이터 합성

Scientific observations generate large quantities of unlabeled data which is laborious to hand-label, making unsupervised learning techniques valuable for proce...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 장기 컨텍스트 자율주행을 위한 계획에 맞춘 토큰 압축

Monolithic vision-action models represent an emerging paradigm in autonomous driving. However, this architecture produces token sequences that quickly exceed re...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] TEVI: 희소 오토인코더를 활용한 텍스트 조건부 시각 표현 편집으로 비전‑언어 정렬 개선

Vision-language models such as CLIP are highly useful for diverse tasks due to their shared image-text embedding space. Despite this, the image and text embeddi...

#research #paper #ai #machine-learning #nlp #computer-vision
1주 전 · ai · - · -

[논문] Skill-3D: 에이전트형 3D 공간 추론을 위한 장면 인식 스킬 진화

This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibi...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] 입술읽기 격차: VSR 모델이 인간처럼 시각적 말을 인식할까?

Visual speech recognition (VSR) models now surpass human lipreaders on benchmarks, but do such gains establish human-like visual speech perception? To explore t...

#research #paper #ai #nlp #computer-vision
1주 전 · ai · - · -

[논문] 시청·기억·추론: 인간 시각 비디오 이해와 MLLM

Video understanding is being rapidly transformed by multimodal large language models (MLLMs), as research moves from short clips to long, multimodal, and knowle...

#research #paper #ai #machine-learning #computer-vision
1주 전 · ai · - · -

[논문] OpenGlass: 온디바이스 이벤트 기반 제스처 인식을 위한 오픈소스 스마트 안경

Smart eyewear enables unobtrusive, context-aware interaction through multimodal sensors and on-device intelligence, but is severely limited by power, memory, an...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] DisPOSE: 투영 다중확률 확산을 이용한 자기지도 다중뷰 3D 인간 자세 추정

Recovering 3D human poses for multiple individuals from different camera views is a fundamental bottleneck for analyzing interacting behaviors. Existing self-su...

#research #paper #ai #computer-vision
1주 전 · ai · - · -

[논문] RealDocBench: 실제 규제 문서의 필드‑레벨 QA와 레이아웃 이해를 위한 벤치마크

Document parsing systems are increasingly deployed in high-stakes, regulated workflows such as mortgage underwriting, financial reporting, supply-chain logistic...

#research #paper #ai #computer-vision

Newer posts

Older posts