[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models
Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos. To fit to the maximum context window constraint, current methods ...
5752 posts from this source
Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos. To fit to the maximum context window constraint, current methods ...
Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue...
Effective water resource management depends on accurate projections of flows in water channels. For projected climate data, use of different General Circulation...
OMD and its variants give a flexible framework for OCO where the performance depends crucially on the choice of the mirror map. While the geometries underlying ...
To validate a clinically accessible approach for quantifying the Upper Extremity Reachable Workspace (UERW) using a single (monocular) camera and Artificial Int...
Partial differential equations often contain unknown functions that are difficult or impossible to measure directly, hampering our ability to derive predictions...
Long-sequence streaming 3D reconstruction remains a significant open challenge. Existing autoregressive models often fail when processing long sequences. They t...
Software source code often harbours 'hotspots': small portions of the code that change far more often than the rest of the project and thus concentrate maintena...
With the advancement of face recognition (FR) systems, privacy-preserving face recognition (PPFR) systems have gained popularity for their accurate recognition,...
The rapid growth of decentralized systems in theWeb3 ecosystem has introduced numerous challenges, particularly in ensuring data security, privacy, and scalabil...
This paper presents a hybrid obstacle avoidance architecture that integrates Optimal Control under clearance with a Fuzzy Rule Based System (FRBS) to enable ada...
Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference...
Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored th...
There has been a growing interest in using neural networks, especially message-passing neural networks (MPNNs), to solve hard combinatorial optimization problem...
Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization...
Graph neural network (GNN) potentials such as SchNet improve the accuracy and transferability of molecular dynamics (MD) simulation by learning many-body intera...
Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as OpenLID or GlotLID)...
Template-free retrosynthesis methods treat the task as black-box sequence generation, limiting learning efficiency, while semi-template approaches rely on rigid...
Assumption-based Argumentation (ABA) is a well-established form of structured argumentation. ABA frameworks with an underlying atomic language are widely studie...
Binary Neural Networks (BNNs) offer a low-complexity and energy-efficient alternative to traditional full-precision neural networks by constraining their weight...
Living languages are shaped by a host of conflicting internal and external evolutionary pressures. While some of these pressures are universal across languages ...
Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation. Despite their practicality, LLM j...
In recent years, there has been growing interest in understanding neural architectures' ability to learn to execute discrete algorithms, a line of work often re...
Using NLP to analyze authentic learner language helps to build automated assessment and feedback tools. It also offers new and extensive insights into the devel...
Large reasoning models with reasoning capabilities achieve state-of-the-art performance on complex tasks, but their robustness under multi-turn adversarial pres...
Detecting anomalies in images and video is an essential task for multiple real-world problems, including industrial inspection, computer-assisted diagnosis, and...
The distinction between genuine grassroots activism and automated influence operations is collapsing. While policy debates focus on bot farms, a distinct threat...
Competency modeling is widely used in human resource management to select, develop, and evaluate talent. However, traditional expert-driven approaches rely heav...
Memory-efficient backpropagation (MeBP) has enabled first-order fine-tuning of large language models (LLMs) on mobile devices with less than 1GB memory. However...
Task-based chatbots are software, typically embedded in real-world applications, that assist users in completing tasks through a conversational interface. As ch...
This paper presents a novel approach, Spectral-Interpretable and -Enhanced Transformer (SIEFormer), which leverages spectral analysis to reinterpret the attenti...
Image generative models are known to duplicate images from the training data as part of their outputs, which can lead to privacy concerns when used for medical ...
We present a complete classification of the distributed computational complexity of local optimization problems in directed cycles for both the deterministic an...
The proactive Asset Administration Shell (AAS) enables bidirectional communication between assets. It uses the Language for I4.0 Components in VDI/VDE 2193 to f...
In this paper, we present a unified framework for various bio-inspired models to better understand their structural and functional differences. We show that liq...
Jhana advanced concentration absorption meditation (ACAM-J) is related to profound changes in consciousness and cognitive processing, making the study of their ...
Understanding how and why large language models (LLMs) fail is becoming a central challenge as models rapidly evolve and static evaluations fall behind. While a...
Understanding what drives code instability is essential for effective software maintenance, as unstable classes require larger or more frequent edits and increa...
Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventional visible...
As self-driving technology advances toward widespread adoption, determining safe operational thresholds across varying environmental conditions becomes critical...
In this paper, we present a 2-local proof labeling scheme with labels in { 0,1,2} for leader election in anonymous meshed graphs. Meshed graphs form a general c...
The Microservices Architecture (MSA) design pattern has become a staple for modern applications, allowing functionalities to be divided across fine-grained micr...
As mobile application (app) functionalities grow increasingly complex and their iterations accelerate, ensuring high reliability presents significant challenges...
The explainable AI (XAI) research community has proposed numerous technical methods, yet deploying explainability as systems remains challenging: Interactive ex...
This article introduces a metamodel for the Business Model Canvas (BMC) using the Unified Modelling Language (UML), together with a dedicated Domain-Specific Mo...
Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationa...
The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA...
Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel ...