computer-vision — Page 25

1개월 전 · ai

[Paper] VASA-3D: 단일 이미지에서 생성된 실감 나는 Audio-Driven Gaussian Head Avatars

우리는 VASA-3D, audio‑driven, single‑shot 3D head avatar generator를 제안한다. 이 연구는 두 가지 주요 과제에 도전한다: 미묘한 표정 디테일을 포착하는 것.

#research #paper #ai #machine-learning #computer-vision
1개월 전 · ai

[Paper] ART: 관절형 재구성 트랜스포머

우리는 ART, Articulated Reconstruction Transformer를 소개한다 — category-agnostic, feed-forward 모델로, 단지 …만으로 완전한 3D 관절형 객체를 복원한다.

#research #paper #ai #computer-vision
1개월 전 · ai

[Paper] EVOLVE-VLA: Vision‑Language‑Action 모델을 위한 환경 피드백 기반 테스트 시 훈련

진정으로 적응적인 embodied intelligence를 달성하려면, 정적인 static demonstrations만을 모방하는 것이 아니라 environment를 통해 지속적으로 개선하는 agents가 필요합니다.

#research #paper #ai #computer-vision
1개월 전 · ai

[Paper] Visual Sentiment Analysis 향상을 위한 Semiotic Isotopy 기반 Dataset Construction

Visual Sentiment Analysis (VSA)는 감정적으로 두드러지는 이미지들의 방대한 다양성과 충분한 데이터를 확보하는 데 내재된 어려움 때문에 도전적인 작업입니다.

#research #paper #ai #computer-vision
1개월 전 · ai

[Paper] 다기관 벤치마크: HE‑염색 전 슬라이드 이미지에서 림프종 아형 구분을 위한 Multiple Instance Learning 모델

시기적절하고 정확한 림프종 진단은 암 치료를 안내하는 데 필수적입니다. 표준 진단 관행은 hematoxylin and eosin (HE) 염색된 전체...

#research #paper #ai #machine-learning #computer-vision
1개월 전 · ai

[Paper] JMMMU-Pro: 이미지 기반 일본어 다학문 다중모달 이해 벤치마크 via Vibe Benchmark Construction

이 논문은 이미지 기반 일본어 다학문 다중모달 이해 벤치마크인 JMMMU‑Pro와 확장 가능한 Vibe Benchmark Construction을 소개한다, ...

#research #paper #ai #machine-learning #nlp #computer-vision
1개월 전 · software

alpr.watch

번역하려는 텍스트를 제공해 주시겠어요? 해당 기사나 댓글의 내용을 직접 복사해서 알려주시면 한국어로 번역해 드리겠습니다.

#license-plate-recognition #computer-vision #open-source #ALPR #surveillance-tool
1개월 전 · ai

Ai2의 Molmo 2, 오픈소스 모델이 비디오 이해에서 독점 거대 기업과 경쟁할 수 있음을 보여줍니다

최근 Olmo 기반 모델 최신 버전을 출시한 직후, Allen Institute for AI(Ai2)는 화요일에 오픈소스 비디오 모델인 Molmo 2를 출시했습니다, …

#Molmo 2 #video understanding #open-source AI #Allen Institute for AI #foundation models #computer vision
1개월 전 · ai

AlphaFlow: MeanFlow 모델 이해 및 개선

AlphaFlow는 MeanFlow 이미지 모델에 대해 보다 부드러운 학습 스케줄을 제공하여 두 목표 간의 충돌을 줄이고 학습을 가속화합니다. 개요...

#MeanFlow #AlphaFlow #image generation #training optimization #deep learning #computer vision
1개월 전 · ai

[Paper] DiffusionBrowser: 인터랙티브 디퓨전 프리뷰 via Multi-Branch Decoders

비디오 디퓨전 모델은 생성 비디오 합성에 혁신을 가져왔지만, 정확도가 떨어지고 느리며 생성 과정에서 불투명할 수 있어 사용자를 …

#research #paper #ai #machine-learning #computer-vision
1개월 전 · ai

[Paper] LitePT: 더 가볍고 더 강력한 Point Transformer

3D 포인트 클라우드 처리를 위한 최신 신경 아키텍처는 convolutional layers와 attention blocks를 모두 포함하지만, 이를 조합하는 최적의 방법은 아직 명확하지 않다.

#research #paper #ai #computer-vision
1개월 전 · ai

[Paper] 확장 가능한 Visual Tokenizers 사전 학습을 향해

시각 토크나이저(예: VAEs)의 latent space 품질은 현대 generative models에 매우 중요합니다. 그러나 표준 reconstruction-based training은 …

#research #paper #ai #computer-vision

Newer posts

Older posts