[Paper] Learning to Forget: Continual Learning with Adaptive Weight Decay
Continual learning agents with finite capacity must balance acquiring new knowledge with retaining the old. This requires controlled forgetting of knowledge tha...
Continual learning agents with finite capacity must balance acquiring new knowledge with retaining the old. This requires controlled forgetting of knowledge tha...
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competi...
Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models ...
We introduce ProcFunc, a library for Blender-based procedural 3D generation in Python. ProcFunc provides a library of easy-to-use Python functions, which stream...
We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the princip...
Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger ...
Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires...
The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongl...
In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the ln ln T factor in the parameter-free learning with expert bound. In this short...
LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two ex...
Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitati...
The task of capturing and rendering 3D dynamic scenes from 2D images has become increasingly popular in recent years. However, most conventional cameras are ban...
Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables?...
Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the comple...
Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these enviro...
This paper provides a concise yet comprehensive review of recent advancements in millimeter-wave (mm-wave) oscillators below 100 GHz and sub-terahertz (sub-THz/...
We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to ...
Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentat...
This paper extends and explains the Multiple Additive Neural Networks (MANN) methodology, an enhancement to the traditional Gradient Boosting framework, utilizi...
Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker pe...
Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying Mo...
Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the...
We propose UAPAR, an Uncertainty-Aware Pedestrian Attribute Recognition framework. To the best of our knowledge, this is the first EDL-based uncertainty-aware f...
Title: Friendly AI Chatbots May Trade Warmth for Accuracy _Last year, researchers at the Oxford Internet Institute began testing five artificial‑intelligencehtt...
We present KAYRA, an end-to-end karyotyping system that operates inside the operational constraints of a clinical cytogenetic laboratory. KAYRA is architected a...
Existing 3D anomaly detection methods are built on a rigid prior: normal geometry is pose-invariant and can be canonicalized through registration or alignment. ...
Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-tra...
When generative AI (genAI) systems are used in high-stakes decision-making, its recommended role is to aid, rather than replace, human decision-making. However,...
Many of the thousands of attested languages share common configurations of features, creating a spectrum from typologically very rare (e.g., object-verb-subject...
When do language diffusion models memorize their training data, and how to quantitatively assess their true generative regime? We address these questions by sho...
We introduce HalluCiteChecker, a toolkit for detecting and verifying hallucinated citations in scientific papers. While AI assistant technologies have transform...
This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under lim...
I propose the Random Cloud method, a training-free approach to neural architecture search that discovers minimal feedforward network topologies through stochast...
We present a Spatially Embedded Evolutionary Algorithm where robot individuals exist in a physically simulated 2D environment, must navigate to encounter potent...
Transformer-based architectures have established a dominant paradigm in global semantic perception; however, they remain fundamentally constrained by the profou...
RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems ch...
In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. W...
Paper • 2603.17074 • Published Mar 17 • 1 /papers/2603.17074...
Why friendly AI chatbots might be less trustworthy AI chatbots trained to be warm and friendly when interacting with users may also be more prone to inaccuraci...
Stargate Overview Stargate is OpenAI’s long‑term effort to build the compute foundation required to deliver the benefits of AGI broadly and reliably to the wor...
This paper presents an application of the biologically realistic JASTAP neural network model to classification tasks. The JASTAP neural network model is present...
Sensory‑First Intelligence: An Agent‑Driven Approach to Brain‑Inspired Neural Architectures The dominant approach in artificial intelligence today is scaling —...
The increasing deployment of Large Language Model (LLM) inference on edge AI systems demands efficient execution under tight memory budgets. A key challenge ari...
This paper investigates efficient methods for utilizing text-only data to improve speech recognition, focusing on encoder-dominated models that facilitate faste...
The following is a joint announcement by the MIT Schwarzman College of Computing and IBM. IBM and MIT today announced the launch of the MIT‑IBM Computing Resear...
TL;DR: Try the three‑year ChatOn AI Assistant Premium Plan—now on sale for $59.50 regular $119.99 with code CHAT30. That works out to less than $20 per year. Ov...
!https://www.androidauthority.com/wp-content/uploads/2024/08/Google-logo-white.jpg TL;DR - The U.S. Department of Defense is reportedly tapping Gemini for class...
Problem Parallel AI coding feels magical until both agents start maintaining their own version of reality. One agent remembers a rule from chat history, while...