EUNO.NEWS EUNO.NEWS
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
  • All (21181) +146
    • AI (3169) +10
    • DevOps (940) +5
    • Software (11185) +102
    • IT (5838) +28
    • Education (48)
  • Notice
  • All (21181) +146
  • AI (3169) +10
  • DevOps (940) +5
  • Software (11185) +102
  • IT (5838) +28
  • Education (48)
  • Notice
Sources Tags Search
한국어 English 中文
  • 5 days ago · ai

    New Apple model combines vision understanding and image generation with impressive results

    Apple researchers have published a study about Manzano, a multimodal model that combines visual understanding and text-to-image generation, while significantly...

    #Apple #multimodal AI #vision-language model #text-to-image generation #Manzano #computer vision #generative AI #AI research
  • 5 days ago · ai

    [Paper] Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

    Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on ...

    #research #paper #ai #machine-learning #computer-vision
  • 5 days ago · ai

    [Paper] SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

    Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its origina...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

    3D pose estimation from sparse multi-views is a critical task for numerous applications, including action recognition, sports analysis, and human-robot interact...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

    Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of ...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] LLMs can Compress LLMs: Adaptive Pruning by Agents

    As Large Language Models (LLMs) continue to scale, post-training pruning has emerged as a promising approach to reduce computational costs while preserving perf...

    #research #paper #ai #machine-learning #nlp #computer-vision
  • 5 days ago · ai

    [Paper] STEP3-VL-10B Technical Report

    We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal ...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings

    Monocular visual SLAM enables 3D reconstruction from internet video and autonomous navigation on resource-constrained platforms, yet suffers from scale drift, i...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] Self-Supervised Animal Identification for Long Videos

    Identifying individual animals in long-duration videos is essential for behavioral ecology, wildlife monitoring, and livestock management. Traditional methods r...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] LiteEmbed: Adapting CLIP to Rare Classes

    Large-scale vision-language models such as CLIP achieve strong zero-shot recognition but struggle with classes that are rarely seen during pretraining, includin...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] Image2Garment: Simulation-ready Garment Generation from a Single Image

    Estimating physically accurate, simulation-ready garments from a single image is challenging due to the absence of image-to-physics datasets and the ill-posed n...

    #research #paper #ai #computer-vision
  • 5 days ago · ai

    [Paper] Identifying Models Behind Text-to-Image Leaderboards

    Text-to-image (T2I) models are increasingly popular, producing a large share of AI-generated images online. To compare model quality, voting-based leaderboards ...

    #research #paper #ai #machine-learning #computer-vision

Newer posts

Older posts
EUNO.NEWS
RSS GitHub © 2026