EUNO.NEWS EUNO.NEWS
  • All (2650) +327
  • AI (584) +25
  • DevOps (153) +5
  • Software (1126) +191
  • IT (781) +105
  • Education (6) +1
  • Notice
  • All (2650) +327
    • AI (584) +25
    • DevOps (153) +5
    • Software (1126) +191
    • IT (781) +105
    • Education (6) +1
  • Notice
  • All (2650) +327
  • AI (584) +25
  • DevOps (153) +5
  • Software (1126) +191
  • IT (781) +105
  • Education (6) +1
  • Notice
Sources Tags Search
한국어 English 中文
  • 4 days ago · ai

    [Paper] Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks

    Handling static images that lack inherent temporal dynamics remains a fundamental challenge for spiking neural networks (SNNs). In directly trained SNNs, static...

    #research #paper #ai #computer-vision
  • 6 days ago · ai

    [Paper] Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

    Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning trace...

    #research #paper #ai #computer-vision
  • 6 days ago · ai

    [Paper] Video-CoM: Interactive Video Reasoning via Chain of Manipulations

    Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still 'think about videos' ie once a video is encoded, reasoning unf...

    #research #paper #ai #computer-vision
  • 6 days ago · ai

    [Paper] AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

    Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video gene...

    #research #paper #ai #computer-vision
  • 6 days ago · ai

    [Paper] Visual Generation Tuning

    Large Vision Language Models (VLMs) effectively bridge the modality gap through extensive pretraining, acquiring sophisticated visual representations aligned wi...

    #research #paper #ai #computer-vision
  • 6 days ago · ai

    [Paper] Object-Centric Data Synthesis for Category-level Object Detection

    Deep learning approaches to object detection have achieved reliable detection of specific object classes in images. However, extending a model's detection capab...

    #research #paper #ai #computer-vision
  • 6 days ago · ai

    [Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval

    Inverse heat problems refer to the estimation of material thermophysical properties given observed or known heat diffusion behaviour. Inverse heat problems have...

    #research #paper #ai #machine-learning #computer-vision
  • 1 week ago · ai

    [Paper] Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model

    Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis towar...

    #research #paper #ai #computer-vision
  • 1 week ago · ai

    [Paper] DisMo: Disentangled Motion Representations for Open-World Motion Transfer

    Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual ...

    #research #paper #ai #computer-vision
  • 1 week ago · ai

    [Paper] MANTA: Physics-Informed Generalized Underwater Object Tracking

    Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water cond...

    #research #paper #ai #computer-vision
  • 1 week ago · ai

    [Paper] VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

    Unifying multimodal understanding, generation and reconstruction representation in a single tokenizer remains a key challenge in building unified models. Previo...

    #research #paper #ai #computer-vision
  • 1 week ago · ai

    [Paper] Optimizing Multimodal Language Models through Attention-based Interpretability

    Modern large language models become multimodal, analyzing various data formats like text and images. While fine-tuning is effective for adapting these multimoda...

    #research #paper #ai #nlp #computer-vision

Newer posts

Older posts
EUNO.NEWS
RSS GitHub © 2025