[Paper] Evolving Excellence: Automated Optimization of LLM-based Agents
Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer sup...
3997 posts from this source
Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer sup...
Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world model...
Novel View Synthesis (NVS) has traditionally relied on models with explicit 3D inductive biases combined with known camera parameters from Structure-from-Motion...
Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper intr...
We introduce two new benchmarks REST and REST+(Render-Equivalence Stress Tests) to enable systematic evaluation of cross-modal inconsistency in multimodal large...
Text-Aware Image Restoration (TAIR) aims to recover high- quality images from low-quality inputs containing degraded textual content. While diffusion models pro...
Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mast...
Quantum Error Correction (QEC) decoding faces a fundamental accuracy-efficiency tradeoff. Classical methods like Minimum Weight Perfect Matching (MWPM) exhibit ...
Nighttime environments pose significant challenges for camera-based perception, as existing methods passively rely on the scene lighting. We introduce Lighting-...
In empirical software engineering (SE) research, researchers have considerable freedom to decide how to process data, what operationalizations to use, and which...
Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover re...
Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly...
Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality,...
Kernel density estimation is a key component of a wide variety of algorithms in machine learning, Bayesian inference, stochastic dynamics and signal processing....
While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been ...
Retrieval-Augmented Generation (RAG) improves the factuality of large language models (LLMs) by grounding outputs in retrieved evidence, but faithfulness failur...
Visual reasoning is challenging, requiring both precise object grounding and understanding complex spatial relationships. Existing methods fall into two camps: ...
Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-sc...
Industrial maintenance is being transformed by the Internet of Things and edge computing, generating continuous data streams that demand real-time, adaptive dec...
The rise of space AI is reshaping government and industry through applications such as disaster detection, border surveillance, and climate monitoring, powered ...
Vision-language models (VLMs) are emerging as powerful generalist tools for remote sensing, capable of integrating information across diverse tasks and enabling...
Real-world datasets often exhibit temporal dynamics characterized by evolving data distributions. Disregarding this phenomenon, commonly referred to as concept ...
Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary appro...
Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-comput...
LLM agents are widely deployed in complex interactive tasks, yet privacy constraints often preclude centralized optimization and co-evolution across dynamic env...
The Development Knowledge Question Answering (Dev Knowledge QA) task aims to provide natural language answers to knowledge-seeking questions during software dev...
Gradually growing the depth of Transformers during training can not only reduce training cost but also lead to improved reasoning performance, as shown by MIDAS...
Understanding human personality is crucial for web applications such as personalized recommendation and mental health assessment. Existing studies on personalit...
As AI-based code generation becomes widespread, researchers are investigating the calibration of code LLMs - ensuring their confidence scores faithfully represe...
Despite advancements in machine learning for security, rule-based detection remains prevalent in Security Operations Centers due to the resource intensiveness a...
Foundation models pretrained on large data have demonstrated remarkable zero-shot generalization capabilities across domains. Building on the success of TabPFN ...
Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critica...
This paper addresses the challenge of aligning large language models (LLMs) with diverse human preferences within federated learning (FL) environments, where st...
We propose a post-training method for lower-resource languages that preserves fluency of language models even when aligned by disfluent reward models. Preferenc...
In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even ...
Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-...
Multigrid methods have been a popular approach for solving linear systems arising from the discretization of partial differential equations (PDEs) for several d...
In this paper, we investigate the potential of spatial and temporal cloud workload shifting to reduce carbon, water, and land-use footprints. Specifically, we p...
This paper introduces the first publicly available dataset for Automatic Essay Scoring (AES) and feedback generation in Basque, targeting the CEFR C1 proficienc...
With this paper, we introduce RESTifAI, an LLM-driven approach for generating reusable, CI/CD ready REST API tests, following the happy-path approach. Unlike ex...
Designing and implementing distributed systems correctly can be quite challenging. Although these systems are often accompanied by formal specifications that ar...
Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, ...
ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the sof...
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and ann...
Efficient edge caching reduces latency and alleviates backhaul congestion in modern networks. Traditional caching policies, such as Least Recently Used (LRU) an...
Predicting the outcomes of professional basketball games, particularly in the National Basketball Association (NBA), has become increasingly important for coach...
Traditionally, multithreaded data structures have been designed for access by the threads of Operating Systems (OS). However, implementations for access by prog...
Multigrid solvers are among the most efficient methods for solving the Poisson equation, which is ubiquitous in computational physics. For example, in the conte...