[Paper] MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the subs...
Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the subs...
Readability assessment aims to evaluate the reading difficulty of a text. In recent years, while deep learning technology has been gradually applied to readabil...
Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal ...
We study two-layer neural networks and train these with a particle-based method called consensus-based optimization (CBO). We compare the performance of CBO aga...
Ensuring the safety of embodied AI agents during task planning is critical for real-world deployment, especially in household environments where dangerous instr...
The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing. Thi...
Remote sensing change captioning is an emerging and popular research task that aims to describe, in natural language, the content of interest that has changed b...
Text-attributed graphs require models to effectively combine strong textual understanding with structurally informed reasoning. Existing approaches either rely ...
Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. Howe...
The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale...
Recent divide-and-conquer reasoning approaches, particularly those based on Chain-of-Thought (CoT), have substantially improved the Text-to-SQL capabilities of ...
Lindsey (2025) investigates introspective awareness in language models through four experiments, finding that models can sometimes detect and identify injected ...
Web automation employs intelligent agents to execute high-level tasks by mimicking human interactions with web interfaces. Despite the capabilities of recent La...
'Thinking with images' has emerged as an effective paradigm for advancing visual reasoning, extending beyond text-only chains of thought by injecting visual evi...
Unit testing is an essential yet laborious technique for verifying software and mitigating regression risks. Although classic automated methods effectively expl...
Automating the adaptation of software engineering (SE) research artifacts across datasets is essential for scalability and reproducibility, yet it remains large...
Spatio-temporal video grounding (STVG) requires localizing a target object in untrimmed videos both temporally and spatially from natural language descriptions....
Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is...
The utility of an explanation method critically depends on its fidelity to the underlying machine learning model. Especially in high-stakes medical settings, cl...
Adversarial Inverse Reinforcement Learning (AIRL) has shown promise in addressing the sparse reward problem in reinforcement learning (RL) by inferring dense re...
Recent advances in multimodal large language models (LLMs) have highlighted their potential for medical and surgical applications. However, existing surgical da...
This paper presents the SIFT-SNN framework, a low-latency neuromorphic signal-processing pipeline for real-time detection of structural anomalies in transport i...
Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operat...
Millions of users across the globe turn to AI chatbots for their creative needs, inviting widespread interest in understanding how such chatbots represent diver...