[Paper] FMplex: Model Virtualization for Serving Extensible Foundation Models
Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing ...
1378 posts from this source
Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing ...
Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing ...
Personas are widely used in software engineering to support requirements elicitation, design, and validation, but their manual creation is costly, time-consumin...
Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment methods suc...
Court simulation bridges legal education and judicial practice, yet human-based simulations are costly and difficult to scale. Large language models (LLMs) offe...
Provenance trees are append-only directed acyclic graphs of artifact registrations anchored on a public blockchain, recently introduced as the data substrate of...
Provenance trees are append-only directed acyclic graphs of artifact registrations anchored on a public blockchain, recently introduced as the data substrate of...
Large language models (LLMs) are increasingly deployed as code generators, where silently wrong programs pose real safety and reliability risks. Reliable uncert...
The Traveling Salesman Problem (TSP) is a classical NP-hard combinatorial optimization problem that aims to find the shortest Hamiltonian cycle visiting each ci...
The Traveling Salesman Problem (TSP) is a classical NP-hard combinatorial optimization problem that aims to find the shortest Hamiltonian cycle visiting each ci...
Dynamic analysis of Android's application layer typically relies on physical devices, limiting scalability and reproducibility. To compensate, we introduce a sy...
Local search is a well-known heuristic method used in optimization. In this thesis, we explore its capabilities on the vertex coloring problem, an NP-hard probl...
Local search is a well-known heuristic method used in optimization. In this thesis, we explore its capabilities on the vertex coloring problem, an NP-hard probl...
Robot middleware faces a new role in the era of Physical AI. Learned policies, planners, and vision-language-action (VLA) models now enter deployed robots as ca...
LLM-generated outputs in software engineering rarely exist in isolation. They must plug into toolchains, APIs, and data pipelines that impose strict, often orga...
Polymer simulations are among the most computationally demanding workloads in soft-matter research, often requiring days of execution and high energy consumptio...
Polymer simulations are among the most computationally demanding workloads in soft-matter research, often requiring days of execution and high energy consumptio...
The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving ot...
The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving ot...
Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple...
Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple...
The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental component of modern ML systems. As model ...
The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental component of modern ML systems. As model ...
Large Language Models (LLMs) are increasingly used in AI-based software engineering, but their limitations in complex task execution and multi-tool coordination...
Recently, mobile edge computing (MEC)-enabled collaborative deep neural network (DNN) inference has emerged as a promising approach for delivering intelligent s...
This paper proposes a co-optimization framework that jointly optimizes SRAM architecture and transistor sizing using equivalent circuit models. The framework si...
This paper proposes a co-optimization framework that jointly optimizes SRAM architecture and transistor sizing using equivalent circuit models. The framework si...
Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the vol...
Recent Byzantine Fault Tolerant (BFT) protocols achieve strong performance by combining the low-latency advantages of leader-based BFT protocols with the high-t...
The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existin...
Trusted Execution Environments (TEEs) have enabled confidential Byzantine Fault-Tolerant (BFT) consensus systems with confidentiality and improved scalability. ...
Developers increasingly provide AI coding assistants with persistent context through configuration files such as CLAUDE.md, AGENTS.md, and .cursorrules. These f...
Modern enterprises face an accelerating onslaught of API-targeted threats amid a rapidly expanding attack surface. Record volumes of software vulnerabilities co...
As large language models (LLMs) are increasingly deployed with highly heterogeneous workloads, chunked-prefill execution has emerged as a mainstream serving arc...
Exploits are widely used to check whether library vulnerabilities appear in different versions and to mark affected version ranges. Exploit-based checks sometim...
Vector databases have been designed and optimized for cloud environments; however, emerging scientific AI workloads (e.g., molecular search, meteorological traj...
Closed-loop network monitoring and orchestration increasingly require semantic interpretations of live telemetry beyond raw counter collection. However, dynamic...
Exploiting parallelism in modern CPU architectures remains a longstanding challenge in optimizing SMT solvers. We introduce a novel parallel framework that dyna...
Code generation models are typically compared using compact execution benchmarks and aggregate pass rates, but such summaries obscure how performance varies acr...
Incrementality is a fundamental design principle to master the complexity of large, long-lived software systems. This principle has been embraced by agile devel...
We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximi...
Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly used in autonomous software testing; however, AI-generated test artifacts often s...
The quality of software engineering is still under a challenge due to disjointed processes between requirements, testing, and production, which hinders the oppo...
W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision...
Modernization of legacy scientific codes is often necessary to keep up with the ever-evolving changes in the compute resource ecosystem. Parallelization and mig...
AI coding assistants have significantly improved developer productivity by automatically suggesting code that aligns with user intent, and many of these tools a...
Prefill-decode (PD) disaggregation decouples prompt processing from token generation, but it also turns the key-value (KV) cache into a network payload. Existin...
We present a multilingual fact-checking system deployed at Factiverse, designed for high-throughput and low-latency operation across diverse languages. The syst...