computer-vision — Page 3

정렬:

2주 전 · ai · - · -

[논문] Lumos‑Nexus: 비디오 통합 모델을 위한 동질 잠재 공간 기반 효율적 주파수 연결

Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generato...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] KLIP: 역문제에서 확산 사전과 KL 발산을 통한 국소 분포 변동 탐지

Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) ...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] TunerDiT: 훈련 없이 확산 트랜스포머를 점진적으로 제어해 다중 이벤트 영상 생성

Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of t...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] 비전·언어 모델, 모호한 입력에서 여성 표현 억제

Alignment teaches vision-language models (VLMs) to avoid expressing demographic biases, and when gender is clearly visible they largely succeed. Far less is kno...

#research #paper #ai #machine-learning #nlp #computer-vision
2주 전 · ai · - · -

[논문] 수술 전 CT를 이용한 수술 후 췌장 누공 자동 예측

Postoperative pancreatic fistula (POPF) is a serious complication after pancreatic resection, increasing morbidity, hospital stay, and healthcare costs. We pres...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] RayDer: 실세계 비디오에서 확장 가능한 자체 지도식 새로운 시점 합성

Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on real...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

애플, 6월 연례 컨퍼런스서 컴퓨터 비전 연구 발표

!https://9to5mac.com/wp-content/uploads/sites/6/2025/07/machine-learning-research.jpg?quality=82&strip=all&w=1600 Apple has shared details of its participation...

#Apple #CVPR2025 #computer vision #machine learning research #AI conference #AMUSE #AToken #audio-visual benchmark #pattern recognition
2주 전 · ai · - · -

[논문] GMOS: 3D 공간·시간에서 움직이는 객체 분할 기반

Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two f...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] VideoMLA: 저랭크 잠재 KV 캐시를 활용한 분 단위 자동회귀 비디오 확산

Long-rollout causal video diffusion은 고정 크기의 슬라이딩 윈도우 KV 캐시로 수렴했으며, 최근의 진전은 이 레이아웃 내에서 어떤 것을 변경함으로써 혁신을 이루고 있다.

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] AdaState: 스트리밍 비디오 생성을 위한 자체 진화형 앵커

Autoregressive video diffusion models generate streaming video by producing frames sequentially, conditioning each chunk on previously generated content. These ...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] NeuROK: 생성적 4D 신경 객체 운동학

데이터 기반 접근 방식은 3D 비전을 혁신시켜, 트랜스포머가 정적 3D 객체를 효과적으로 재구성하고 생성할 수 있게 했습니다. 그러나, 시뮬...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] YoCausal: 비디오 생성은 World Model에서 얼마나 떨어져 있는가? 인과관점

비디오 확산 모델(VDMs)이 세계 모델로 발전함에 따라, 핵심적인 질문이 제기됩니다: 이 모델들이 인과 관계를 진정으로 이해하고 있는가, 아니면 단지 통계적 시간적 패턴에 과적합하고 있는가?

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] 불확실성 기반 3D Gaussian Splatting 액티브 매핑 via 이방성 가시성 필드

우리는 Gaussian Splatting Anisotropic Visibility Field (GAVIS)를 제시한다, 이는 3DGS에서 불확실성 정량화와 능동 매핑을 위한 새로운 프레임워크이다. 우리의 핵심 통찰은…

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] GPIC: 시각 생성용 대규모 허용 이미지 코퍼스

시각 생성 모델링을 위한 확장 가능한 방법을 연구하려면 크고 접근 가능하며 안정적인 데이터셋이 필요합니다. 우리는 GPIC, 즉 Giant Permissive Image Corpus를 소개합니다.

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] 단일 요인 물리 기반 비디오‑오디오 생성 벤치마크

Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes. Exis...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] REST3D: 단일 이미지로부터 물리적으로 안정적인 3D 장면 재구성

단일 RGB 이미지에서 물리적으로 안정적인 3D 장면을 재구성하면 일상적인 이미지를 시뮬레이션에 바로 사용할 수 있는 디지털 자산으로 변환할 수 있어, 응용 프로그램을 위해 …

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] 색 잡힌 노이즈 확산 샘플링

Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency ...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] Ciphera: 탈중앙화 바이오메트릭 신원 프레임워크

Centralised biometric identity systems expose users to single points of failure, opaque verification processes, and irreversible biometric compromise. Decentral...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] 픽셀에서 단어까지 — 대규모 네이티브 원비전 모델을 향해

Current vision-language models (VLMs) typically stitch together separate image encoders and language decoders via multi-stage alignment, a modular framework tha...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] Gamma-World: 생성적 멀티에이전트 월드 모델링, 두 플레이어를 넘어

World models for interactive video generation은 주로 single-agent 설정에 초점을 맞추어 왔으며, 여기서 future observations는 단일 control signal로부터 생성됩니다.

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] HarmoVid: Relightful 비디오 초상화 조화

우리는 foreground video의 조명을 target background scene에 맞추어 shadows, color tone, illumination intensity를 조정하는 방법을 제시합니다.

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] AREA: CLIP 기반 클래스 증분 학습을 위한 속성 추출 및 집계

Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similar...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] 명시적·암시적 증거를 통한 개인 시각 기억

Long-term memory는 개인화된 AI agents에게 점점 더 중요해지고 있지만, 기존 benchmarks와 methods는 여전히 주로 text‑centric합니다. 이미지가 포함될 때조차도...

#research #paper #ai #nlp #computer-vision
2주 전 · ai · - · -

[Paper] OmniVerifier-M1: 명시적 구조 재보정을 갖춘 다중모달 메타 검증기

시각적 결과는 멀티모달 대형 언어 모델에서 점점 더 중심적인 역할을 차지하고 있으며, 신뢰할 수 있고 세밀한 검증이 범용 기반 모델을 확장하는 데 필수적입니다.

#research #paper #ai #machine-learning #nlp #computer-vision
2주 전 · ai · - · -

[논문] Ω‑QVLA: 복합 회전·단계별 스케일링으로 비전‑언어‑액션 모델 강인 양자화

Vision-Language-Action (VLA) models unify perception, reasoning, and control within a single policy, yet their multi-billion-parameter backbones and diffusion-b...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] 편향은 Gradient Trail을 남긴다: Label-Free 편향 식별 via Gradient Probes on Concept Decompositions

Vision classifiers는 spurious correlations를 활용하여 in-distribution 정확도가 높지만 distribution shift 상황에서는 실패한다. 기존의 bias에 대한 접근 방식은 …

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] Vision‑Language 인과 추론에서의 추상화 격차

Vision-language models (VLMs)은 유창한 인과 설명을 생성하지만, 현재 평가 방법은 언어적 타당성과 충실한 인과 추론을 구별하지 못한다.

#research #paper #ai #nlp #computer-vision
2주 전 · ai · - · -

[Paper] Self-Prophetic Decoding을 활용한 LVLMs의 Visual Search 활성화

대형 비전-언어 모델(LVLMs)은 진정한 멀티모달 추론을 향해 빠르게 진화하고 있으며, 시각 검색은 구체적인 구현 사례를 나타냅니다.

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] 지속적인 게임 생성용 GUI 에이전트

Generating a game is not the same as making one that can be played. Despite advances in code generation, existing approaches treat game generation as one-shot t...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[논문] G3T Up! 중력 정렬 좌표계가 포인트맵 처리를 간소화

Modern feed-forward 3D reconstruction methods like VGGT predict pixel-aligned pointmaps in camera-centric coordinate frames. However, this choice of coordinate ...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] SpatialBench: 당신의 Spatial Foundation Model은 전천후 플레이어인가?

공간 파운데이션 모델이 표준 데이터셋에서 인상적인 성능을 보여주었지만, 중요한 질문이 남아 있다: 과연 이들이 진정한 전천후 플레이어인지…

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] LocateAnything: 빠르고 고품질 비전-언어 그라운딩 및 병렬 박스 디코딩

Vision-language models (VLMs)는 일반적으로 시각적 grounding과 detection을 좌표 토큰 생성 문제로 공식화하여, 각 2D 박스를 여러 …

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] Feedforward 3D Editing은 Semantic‑Part Transformation으로부터 학습한다

3D 편집은 확장 가능한 3D 콘텐츠 제작을 위한 기본적인 역량입니다. 이미지 편집은 대규모 피드포워드 생성 패러다임으로 빠르게 진화해 왔으며…

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] 눈이 AI를 배신할 때: 사회적 시선 일관성을 AI 생성 이미지 탐지를 위한 시맨틱 큐

최근 생성 모델들은 저수준 아티팩트—pixel fingerprints, frequency anomalies, upsampling traces—에 대한 격차를 크게 좁혔으며, 특히 ...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] 표현-조건부 Diffusion Models를 통한 제어 가능한 이미지 생성

Diffusion 모델은 고품질 이미지 생성 및 편집을 위한 강력한 도구로 부상했지만, 이러한 모델을 특정 출력으로 유도하는 것은 여전히 도전 과제입니다.

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] PARE: 프루닝 및 적응형 라우팅을 통한 효율적인 비디오 생성

Video Diffusion Transformers (DiTs)는 고품질 비디오를 생성하지만, 넓은 블록, 깊은 아키텍처, 그리고 반복 샘플링 때문에 상당한 연산량을 요구합니다.

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] EdgeFlow: 산업 요구공학을 위한 엣지맵 강화 VLM 기반 흐름도 처리

Flowcharts are widely used in industrial requirements, but usually remain embedded as static images. Vision Language Models (VLMs) show promise in the conversio...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] Q-GeoMem: 질문 기반 기하학 메모리 for Video Spatial Reasoning

비디오 공간 추론은 질문에 유용한 정보를 유지하면서 시간에 따라 시점 의존적인 증거를 축적해야 합니다. 기존 sp...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] 차트그래퍼: 비전‑언어 모델 평가를 위한 반사실 차트 생성

Chart question-answering (QA) benchmarks aim to pose questions that require visual reasoning to correctly answer, but models can often reach solutions through s...

#research #paper #ai #nlp #computer-vision
2주 전 · ai · - · -

[Paper] TriSplat: 시뮬레이션 준비된 피드포워드 3D 씬 재구성

Sparse-view 3D 재구성은 이미지로부터 직접 explicit primitives를 예측하는 feed-forward splatting networks를 사용하여 점점 더 많이 다루어지고 있다. 그러나 대부분의 기존…

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] AnyScene: 어디서든, 그 이상까지 가능한 고제어 운전 장면 생성

Generating high-fidelity and controllable synthetic data is critical for advancing end-to-end autonomous driving, particularly for addressing the long tail of r...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] 멀티모달 대형 언어 모델에서 용량을 끌어내어 Subject-driven Generation

주제 기반 이미지 생성은 주어진 대상의 정체성을 유지하면서 텍스트 지침을 따르는 새로운 이미지를 합성하는 것을 목표로 합니다. 기존 앱...

#research #paper #ai #machine-learning #computer-vision
2주 전 · ai · - · -

[Paper] Prism: 확장 가능한 멀티모달 지속적 인스트럭션 튜닝을 위한 플러그인 재현 가능 인프라

Multimodal Large Language Models (MLLMs)은 다양한 작업을 통합된 instruction‑following 프레임워크로 재구성하고 instruction tuning을 통해 다재다능성을 달성합니다.

#research #paper #ai #machine-learning #nlp #computer-vision
2주 전 · ai · - · -

[Paper] Helix4D: 복잡한 4D 메시 생성

현재 video-to-4D 방법들은 복잡한 topology 변화, transparent materials, thin structures, 그리고 inner surfaces를 처리하는 데 어려움을 겪습니다. 우리는 Helix4D, a dynamic me...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] Reward-Tilted Distribution Matching을 통한 Few-step Generators 강화

최근 few-step diffusion distillation의 발전으로 효율적인 image generation이 가능해졌지만, 이러한 모델을 인간 선호와 일치시키는 것은 여전히 어려운 과제입니다.

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[Paper] 온-폴리시 적대적 흐름 증류를 통한 자동회귀 비디오 생성

Autoregressive video generators는 스트리밍, 장기 시계열, 인터랙티브 애플리케이션에 매력적이지만, 강력한 블랙박스 교사를 인과적 s...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] EVIDENT: 엔터티 기반 시각 증거를 활용한 MLLM 적응 라우팅으로 교차 도메인 비디오 시간적 그라운딩

Fine-tuning MLLMs for Video Temporal Grounding (VTG) often improves in-domain performance but degrades sharply under domain shift. In this work, we find that th...

#research #paper #ai #computer-vision
2주 전 · ai · - · -

[논문] 전역 구조‑모션, 피드포워드 재구성과 만나다

Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge...

#research #paper #ai #computer-vision

Newer posts

Older posts