Source

arXiv

5752 posts from this source

Sort:

2 months ago · ai · - · -

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iterati...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing r...

#retrieval #transformers #long-document #attention #information-retrieval
2 months ago · ai · - · -

[Paper] Agentic Test-Time Scaling for WebAgents

Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step ...

#test-time scaling #web agents #LLM uncertainty #resource allocation #AI research
2 months ago · ai · - · -

[Paper] On-Policy Context Distillation for Language Models

Context distillation enables language models to internalize in-context knowledge into their parameters. In our work, we propose On-Policy Context Distillation (...

#research #paper #ai #nlp
2 months ago · ai · - · -

[Paper] Function-Space Decoupled Diffusion for Forward and Inverse Modeling in Carbon Capture and Storage

Accurate characterization of subsurface flow is critical for Carbon Capture and Storage (CCS) but remains challenged by the ill-posed nature of inverse problems...

#diffusion models #neural operators #carbon capture #inverse modeling #generative AI
2 months ago · ai · - · -

[Paper] Learning to Control: The iUzawa-Net for Nonsmooth Optimal Control of Linear PDEs

We propose an optimization-informed deep neural network approach, named iUzawa-Net, aiming for the first solver that enables real-time solutions for a class of ...

#optimal control #partial differential equations #deep learning #neural network architecture #numerical optimization
2 months ago · ai · - · -

[Paper] MonarchRT: Efficient Attention for Real-Time Video Generation

Real-time video generation with Diffusion Transformers is bottlenecked by the quadratic cost of 3D self-attention, especially in real-time regimes that are both...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Creative Ownership in the Age of AI

Copyright law focuses on whether a new work is 'substantially similar' to an existing one, but generative AI can closely imitate style without copying content, ...

#generative-ai #copyright #legal-framework #machine-learning #research
2 months ago · ai · - · -

[Paper] CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

AI agents are increasingly used to solve real-world tasks by reasoning over multi-turn user interactions and invoking external tools. However, applying reinforc...

#reinforcement learning #large language models #tool-use agents #checklist rewards #RLHF
2 months ago · ai · - · -

[Paper] Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data

Self-supervised learning (SSL) is a powerful paradigm for learning from unlabeled time-series data. However, popular methods such as masked autoencoders (MAEs) ...

#research #paper #ai #machine-learning
2 months ago · ai · - · -

[Paper] T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their i...

#research #paper #ai #machine-learning #nlp
2 months ago · it · - · -

[Paper] Legitimate Overrides in Decentralized Protocols

Decentralized protocols claim immutable, rule-based execution, yet many embed emergency mechanisms such as chain-level freezes, protocol pauses, and account qua...

#blockchain #decentralized governance #emergency overrides #on-chain security #protocol design
2 months ago · ai · - · -

[Paper] Think like a Scientist: Physics-guided LLM Agent for Equation Discovery

Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. Recently, large language models (LLMs) have emerged as ...

#research #paper #ai #machine-learning
2 months ago · ai · - · -

[Paper] On the implicit regularization of Langevin dynamics with projected noise

We study Langevin dynamics with noise projected onto the directions orthogonal to an isometric group action. This mathematical model is introduced to shed new l...

#research #paper #ai #machine-learning
2 months ago · software · - · -

[Paper] Automated Test Suite Enhancement Using Large Language Models with Few-shot Prompting

Unit testing is essential for verifying the functional correctness of code modules (e.g., classes, methods), but manually writing unit tests is often labor-inte...

#unit testing #large language models #few-shot prompting #test generation #software quality
2 months ago · ai · - · -

[Paper] Is Online Linear Optimization Sufficient for Strategic Robustness?

We consider bidding in repeated Bayesian first-price auctions. Bidding algorithms that achieve optimal regret have been extensively studied, but their strategic...

#online learning #auction theory #strategic robustness #regret minimization #machine learning
2 months ago · education · - · -

[Paper] A technical curriculum on language-oriented artificial intelligence in translation and specialised communication

This paper presents a technical curriculum on language-oriented artificial intelligence (AI) in the language and translation (L&T) industry. The curriculum ...

#language-models #translation-technology #AI-curriculum #nlp #upskilling
2 months ago · ai · - · -

[Paper] Community Concealment from Unsupervised Graph Learning-Based Clustering

Graph neural networks (GNNs) are designed to use attributed graphs to learn representations. Such representations are beneficial in the unsupervised learning of...

#graph neural networks #privacy #unsupervised clustering #community concealment #perturbation
2 months ago · ai · - · -

[Paper] 'Sorry, I Didn't Catch That': How Speech Models Miss What Matters Most

Despite speech recognition systems achieving low word error rates on standard benchmarks, they often fail on short, high-stakes utterances in real-world deploym...

#research #paper #ai #machine-learning #nlp
2 months ago · ai · - · -

[Paper] ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction

Unstructured documents like PDFs contain valuable structured information, but downstream systems require this data in reliable, standardized formats. LLMs are i...

#LLM evaluation #structured extraction #benchmark #PDF-to-JSON #extractbench
2 months ago · ai · - · -

[Paper] Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications

Latency-critical speech applications (e.g., live transcription, voice commands, and real-time translation) demand low time-to-first-token (TTFT) and high transc...

#research #paper #ai #machine-learning #nlp
2 months ago · ai · - · -

[Paper] Olmix: A Framework for Data Mixing Throughout LM Development

Data mixing -- determining the ratios of data from different domains -- is a first-order concern for training language models (LMs). While existing mixing metho...

#research #paper #ai #machine-learning #nlp
2 months ago · ai · - · -

[Paper] Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision

Neuromorphic vision systems based on spiking neural networks (SNNs) offer ultra-low-power perception for event-based and frame-based cameras, yet catastrophic f...

#spiking neural networks #neuromorphic computing #continual learning #energy efficiency #event-based vision
2 months ago · ai · - · -

[Paper] Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. So...

#token compression #retrieval-augmented generation #overflow detection #LLM #NLP
2 months ago · ai · - · -

[Paper] Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Supervised fine-tuning (SFT) is computationally efficient but often yields inferior generalization compared to reinforcement learning (RL). This gap is primaril...

#research #paper #ai #machine-learning #computer-vision
2 months ago · ai · - · -

[Paper] Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

We propose UniDFlow, a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation...

#multimodal #flow-matching #transformers #research-paper #vision-language
2 months ago · ai · - · -

[Paper] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Current unified multimodal models for image generation and editing typically rely on massive parameter scales (e.g., >10B), entailing prohibitive training co...

#multimodal-model #image-generation #diffusion-transformer #deep-learning #computer-vision
2 months ago · ai · - · -

[Paper] ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

Enterprise documents, such as forms and reports, embed critical information for downstream applications like data archiving, automated workflows, and analytics....

#information extraction #vision-language models #document AI #benchmark dataset #structured output
2 months ago · ai · - · -

[Paper] Visual Reasoning Benchmark: Evaluating Multimodal LLMs on Classroom-Authentic Visual Problems from Primary Education

AI models have achieved state-of-the-art results in textual reasoning; however, their ability to reason over spatial and relational structures remains a critica...

#research #paper #ai #machine-learning #nlp
2 months ago · software · - · -

[Paper] Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach

The rapid evolution of cyberattacks continues to drive the emergence of unknown (zero-day) threats, posing significant challenges for network intrusion detectio...

#research #paper #software
2 months ago · ai · - · -

[Paper] EO-VAE: Towards A Multi-sensor Tokenizer for Earth Observation Data

State-of-the-art generative image and video models rely heavily on tokenizers that compress high-dimensional inputs into more efficient latent representations. ...

#variational autoencoder #earth observation #multisensor tokenization #remote sensing #generative AI
2 months ago · ai · - · -

[Paper] DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Recent advancements in foundation models have revolutionized joint audio-video generation. However, existing approaches typically treat human-centric tasks incl...

#multimodal generation #diffusion models #audio-video synthesis #research paper
2 months ago · ai · - · -

[Paper] TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. ...

#research #paper #ai #computer-vision
2 months ago · devops · - · -

[Paper] OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration

Serving Large Language Models (LLMs) can benefit immensely from parallelizing both the model and input requests across multiple devices, but incoming workloads ...

#research #paper #devops
2 months ago · ai · - · -

[Paper] On the Adoption of AI Coding Agents in Open-source Android and iOS Development

AI coding agents are increasingly contributing to software development, yet their impact on mobile development has received little empirical attention. In this ...

#research #paper #ai #machine-learning
2 months ago · software · - · -

[Paper] PPTAM$η$: Energy Aware CI/CD Pipeline for Container Based Applications

Modern container-based microservices evolve through rapid deployment cycles, but CI/CD pipelines still rarely measure energy consumption, even though prior work...

#research #paper #software
2 months ago · software · - · -

[Paper] Performance Antipatterns: Angel or Devil for Power Consumption?

Performance antipatterns are known to degrade the responsiveness of microservice-based systems, but their impact on energy consumption remains largely unexplore...

#research #paper #software
2 months ago · it · - · -

[Paper] Contention Resolution, With and Without a Global Clock

In the Contention Resolution problem n parties each wish to have exclusive use of a shared resource for one unit of time. The problem has been studied since the...

#contention-resolution #distributed-algorithms #global-clock #backoff-protocol #latency-analysis
2 months ago · ai · - · -

[Paper] ModelWisdom: An Integrated Toolkit for TLA+ Model Visualization, Digest and Repair

Model checking in TLA+ provides strong correctness guarantees, yet practitioners continue to face significant challenges in interpreting counterexamples, unders...

#research #paper #ai #machine-learning
2 months ago · ai · - · -

[Paper] An Empirical Study of the Imbalance Issue in Software Vulnerability Detection

Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, l...

#software vulnerability detection #class imbalance #deep learning #security ML #empirical study
2 months ago · ai · - · -

[Paper] PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving

Multi-agent systems increasingly orchestrate multiple specialized language models to solve complex real-world problems, often invoking them over a shared contex...

#research #paper #ai #machine-learning
2 months ago · devops · - · -

[Paper] An Auction-Based Mechanism for Optimal Task Allocation and Resource Aware Containerization

Distributed computing has enabled cooperation between multiple computing devices for the simultaneous execution of resource-hungry tasks. Such execution also pl...

#auction scheduling #docker-swarm #edge-computing #resource-aware containers #IoT workloads
2 months ago · ai · - · -

[Paper] Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatic...

#coding agents #LLM evaluation #software engineering benchmarks #AGENTS.md
2 months ago · software · - · -

[Paper] Studying Quality Improvements Recommended via Manual and Automated Code Review

Several Deep Learning (DL)-based techniques have been proposed to automate code review. Still, it is unclear the extent to which these approaches can recommend ...

#research #paper #software
2 months ago · software · - · -

[Paper] Improving Code Generation via Small Language Model-as-a-judge

Large language models (LLMs) have shown remarkable capabilities in automated code generation. While effective for mainstream languages, they may underperform on...

#research #paper #software
2 months ago · ai · - · -

[Paper] MUSE: Multi-Tenant Model Serving With Seamless Model Updates

In binary classification systems, decision thresholds translate model scores into actions. Choosing suitable thresholds relies on the specific distribution of t...

#research #paper #ai #machine-learning
2 months ago · devops · - · -

[Paper] Designing Scalable Rate Limiting Systems: Algorithms, Architecture, and Distributed Solutions

Designing a rate limiter that is simultaneously accurate, available, and scalable presents a fundamental challenge in distributed systems, primarily due to the ...

#rate limiting #redis #distributed systems #sliding window algorithm #scalability
2 months ago · devops · - · -

[Paper] GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing

Distributing LLM inference across geographical regions can improve Time-to-First-Token (TTFT) by regionalizing service deployments. While existing multi-region ...

#LLM inference #KV-cache reuse #cross-region load balancing #network latency #cost-aware routing

Newer posts

Older posts