[Paper] In Pursuit of Pixel Supervision for Visual Pre-training
At the most basic level, pixels are the source of the visual information through which we perceive the world. Pixels contain information at all levels, ranging ...
5856 posts from this source
At the most basic level, pixels are the source of the visual information through which we perceive the world. Pixels contain information at all levels, ranging ...
In recent multimodal research, the diffusion paradigm has emerged as a promising alternative to the autoregressive paradigm (AR), owing to its unique decoding a...
Interpreting the internal activations of neural networks can produce more faithful explanations of their behavior, but is difficult due to the complex structure...
We present Gaussian Pixel Codec Avatars (GPiCA), photorealistic head avatars that can be generated from multi-view images and efficiently rendered on mobile dev...
This paper proposes a dual-engine AI architectural method designed to address the complex problem of exploring potential trajectories in the evolution of art. W...
Foundation models are vital tools in various Computer Vision applications. They take as input a single RGB image and output a deep feature representation that i...
Active Speaker Detection (ASD) aims to identify who is currently speaking in each frame of a video. Most state-of-the-art approaches rely on late fusion to comb...
In a mathematical model of interacting biological organisms, where external interventions may alter behavior over time, traditional models that assume fixed par...
Early-Exit (EE) is a Large Language Model (LLM) architecture that accelerates inference by allowing easier tokens to be generated using only a subset of the mod...
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent w...
Evaluations of image compression performance which include human preferences have generally found that naive distortion functions such as MSE are insufficiently...
We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs an...
The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors...
Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected st...
Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for app...
Future AI agents might run autonomously with elevated privileges. If these agents are misaligned, they might abuse these privileges to cause serious damage. The...
Reinforcement learning has become essential for strengthening the reasoning abilities of large language models, yet current exploration mechanisms remain fundam...
This paper presents a unified framework, for the detection, classification, and preliminary localization of anomalies in water distribution networks using multi...
Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. ...
With the push towards Exascale computing and data-driven methods, problem sizes have increased dramatically, increasing the computational requirements of the un...
This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of...
Large language model (LLM) activations are notoriously difficult to understand, with most existing techniques using complex, specialized methods for interpretin...
Large language models (LLMs) exhibit remarkable capabilities, yet their reasoning remains opaque, raising safety and trust concerns. Attribution methods, which ...
Human beings solve complex problems through critical thinking, where reasoning and evaluation are intertwined to converge toward correct solutions. However, mos...
Raft is a leading consensus algorithm for replicating writes in distributed databases. However, distributed databases also require consistent reads. To guarante...
Continual learning remains a fundamental challenge in machine learning, requiring models to learn from a stream of tasks without forgetting previously acquired ...
State space models (SSMs) are a promising alternative to transformers for language modeling because they use fixed memory during inference. However, this fixed ...
The computational and memory overheads associated with expanding the context window of LLMs severely limit their scalability. A noteworthy solution is vision-te...
Large language models are increasingly adapted to downstream tasks through fine-tuning. Full supervised fine-tuning (SFT) and parameter-efficient fine-tuning (P...
LLMs (Large Language Models) are increasingly used in text processing pipelines to intelligently respond to a variety of inputs and generation tasks. This raise...
Working memory enables the brain to integrate transient information for rapid decision-making. Artificial networks typically replicate this via recurrent or par...
Psychological defenses are strategies, often automatic, that people use to manage distress. Rigid or overuse of defenses is negatively linked to mental health a...
Bloom filters are a fundamental data structure for approximate membership queries, with applications ranging from data analytics to databases and genomics. Seve...
We introduce Bolmo, the first family of competitive fully open byte-level language models (LMs) at the 1B and 7B parameter scales. In contrast to prior research...
Many business processes currently depend on web services, often using REST APIs for communication. REST APIs expose web service functionality through endpoints,...
The success of large language models for code relies on vast amounts of code data, including public open-source repositories, such as GitHub, and private, confi...
The use of large language models like ChatGPT in code review offers promising efficiency gains but also raises concerns about correctness and safety. Existing e...
In manufacturing, digital twins, realized as Asset Administration Shells (AAS), have emerged as a prevalent practice. These digital replicas, often utilized as ...
Reusable software components, typically distributed as packages, are a central paradigm of modern software development. The JavaScript ecosystem serves as a pri...
Ensuring the safety and reliability of Automated Driving Systems (ADS) remains a critical challenge, as traditional verification methods such as large-scale on-...
Modern data centers contain thousands of servers making them major consumers of electricity. To minimize their environmental impact, it is critical that we use ...
We present LLMQ, an end-to-end CUDA/C++ implementation for medium-sized language-model training, e.g. 3B to 32B parameters, on affordable, commodity GPUs. These...
Digital twin (DT) technology integrates heterogeneous data and models, along with semantic technologies to create multi-layered digital representation of physic...
The increasing and widespread use of BPMN business processes, also embodying DMN tables, requires tools and methodologies to verify their correctness. However, ...
Data-driven evolutionary algorithms has shown surprising results in addressing expensive optimization problems through robust surrogate modeling. Though promisi...
The pursuit of high-performance data transfer often focuses on raw network bandwidth, and international links of 100 Gbps or higher are frequently considered th...
The increasing computational demands of modern AI systems have exposed fundamental limitations of digital hardware, driving interest in alternative paradigms fo...
The innovative agriculture system is revolutionizing how we farm, making it one of the most critical innovations of our time! Yet it faces significant connectiv...