[Paper] Low-Bandwidth 모델에서의 직사각형 행렬 곱셈
우리는 분산 컴퓨팅의 저대역폭 모델에서 직사각형 행렬 곱셈을 연구한다. n개의 컴퓨터가 있으며, 초기 입력 행렬은 분산되어 있다.
1025 posts from this source
우리는 분산 컴퓨팅의 저대역폭 모델에서 직사각형 행렬 곱셈을 연구한다. n개의 컴퓨터가 있으며, 초기 입력 행렬은 분산되어 있다.
LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent...
Speculative inference (SPIN) was originally developed as an efficient architecture to accelerate Large Language Models (LLMs). In this work, we propose its dist...
This study aims to determine whether the application of Deep Reinforcement Learning (DRL) as a specialized execution overlay can enhance pair trading in highly ...
Offline multi-objective optimization (Offline MOO)은 비용이 많이 드는 환경 상호작용 없이 static datasets를 기반으로 새로운 Pareto-optimal 설계를 발견하는 것을 목표로 합니다.
Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward p...
Modern AI serving increasingly relies on NPUs for conventional inference and large language model serving. However, current NPU deployments commonly expose phys...
LZ77-based codecs exhibit a fundamental sequential bottleneck in decoding: each back-reference depends on previously decompressed data, preventing multi-core sc...
This paper investigates 'free lunch' strategies to boost the performance of lidar semantic scene completion (SSC) without requiring complex architectural redesi...
Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters ...
We investigate whether neuron populations within neural networks evolve predictably with scale, extending scaling laws beyond macroscopic observables such as lo...
Images composed of 2D pixel arrays are the standard input to computer vision algorithms, yet many underlying computations can be distributed across pixels. Tran...
Previous work has evaluated physics reasoning in foundation models using synthetic or semi-synthetic scenes and visual question-answering tasks. However, these ...
spiking neural networks을 훈련시키는 능력은 biological neural networks를 모델링하고 neuromorphic computing을 수행하는 데 필수적입니다. 그러나, 확장성을 위해…
Urban traffic simulation is a critical tool for infrastructure planning, including the placement of electric vehicle charging stations. However, realistic traff...
부호 스파이킹 뉴런은 표준 스파이킹 뉴런보다 더 풍부한 정보를 전달합니다. 이 연구는 부호를 위한 컴팩트한 자기 터널 접합(MTJ) 기반 뉴런을 제안합니다.
Equilibrium Propagation (EP)은 물리 기반 훈련 프레임워크로, 주로 연속 Hopfield 네트워크를 포함한 에너지 기반 모델에 사용되어 왔습니다.
전전두엽 피질(PFC)은 행동 계획을 위해 목표 정보를 유지하지만, recurrent circuits가 행동 시간에 걸쳐 이를 행동에 사용할 수 있는 형태로 어떻게 보존하는지는…
The large sizes of Spiking Vision Transformers (SViTs) still hinder their embedded implementation, highlighting the need for model compression. State-of-the-art...
The 2026 disproof of Erdős's unit-distance conjecture and Sawin's subsequent explicit quantitative refinement show that the maximum number u(n) of unit distance...
Spiking Vision Transformer (SViT) models are promising low-power ViT models for solving vision-based tasks with state-of-the-art performance. However, their lar...
Neural-guided Ant Colony Optimization (ACO) suffers from a fundamental training-inference misalignment: policies are typically trained to generate static priors...
Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and ...
Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical ...
Scaling robot learning requires large-scale, diverse demonstrations, yet real-world data collection via teleoperation remains prohibitively expensive and time-c...
Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually acquire n...
Autoregressive world models have emerged as a powerful paradigm for interactive video generation, allowing users to navigate dynamically generated environments ...
Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video ML...
Clinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequenti...
Heterogeneous Differential Privacy (HDP) in Federated Learning (FL) allows clients to select individual privacy budgets (varepsilon_i) according to institutiona...
Autonomous robots that interact with people must make safe and efficient decisions under human-induced uncertainty, such as their preferences, goals, competency...
Post-training compression of Large Language Models (LLMs) removes entire architectural components, either deleting them or replacing them with fitted modules. E...
Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points ...
Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) LLMs, offering faster inference through parallel...
Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans ...
Aligning Large Language Models (LLMs) with human values often degrades their general capabilities, termed the alignment tax. Existing methods mitigate this by b...
Long-tailed recognition poses a significant challenge for deep learning. The two-stage decoupling paradigm, which separates representation learning from classif...
Agentic systems entering production typically operate as partially integrated assemblies where structural defects, not task-level errors, dominate the failure l...
By listing the components included in an application, Software Bills of Materials (SBOMs) are intended to support the timely identification of vulnerable compon...
Large language models (LLMs) are increasingly integrated into high-performance computing (HPC) workflows, accelerating scientific discovery through diverse pers...
The complexity of biomolecular simulations has substantially increased the demand for High-Performance Computing (HPC) infrastructures, particularly in molecula...
The edge-cloud paradigm improves service delivery by orchestrating resources across edge nodes and cloud data centres. These environments consist of heterogeneo...
Timed-Arc Petri net (TAPN) is a timed extension of the classical Petri net model where tokens have their age and input arcs are associated with time intervals r...
Reproducibility in empirical software engineering relies on complete, accessible, and reusable research artifacts, yet artifact evaluation remains largely manua...
We consider LLM-based algorithm development through a case study on contractionorder optimisation for tensor networks with OpenEvolve. We pay particular attenti...
Background: Developers increasingly review multi-file code changes generated by LLM-based agents, yet no validated end-to-end workflow or IDE tooling design exi...
Deployers of online LLM services usually seek to maximize cluster-wide performance given a fixed number of GPUs. Tensor parallelism (TP) is necessary to fit mod...
Machine learning systems consist of general-purpose code as well as machine-learning-specific code. While ML-specific code smells have been identified, their co...