[Paper] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs
Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safe...
Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safe...
We study the minimax sample complexity of multicalibration in the batch setting. A learner observes n i.i.d. samples from an unknown distribution and must outpu...
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We ...
As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate mod...
We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an...
Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are n...
Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scienti...
Humans and modern vision models can reach similar classification accuracy while making systematically different kinds of mistakes - differing not in how often t...
Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of bill...
In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, develo...
Deep-learning video super-resolution has progressed rapidly, but climate applications typically super-resolve (increase resolution) either space or time, and jo...
As model sizes continue to grow, parameter-efficient fine-tuning has emerged as a powerful alternative to full fine-tuning. While LoRA is widely adopted among t...
Analyses of legislative behavior often rely on voting records, overlooking the rich semantic and rhetorical content of political speech. In this paper, we ask t...
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of ...
Geographic context is often consider relevant to motor insurance risk, yet public actuarial datasets provide limited location identifiers, constraining how this...
Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through ...
Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed d...
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massi...
Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios....
Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based ...
The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Inte...
Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regul...
Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-suppor...
STEM education researchers are often interested in identifying moments of students' mechanistic reasoning for deeper analysis, but have limited capacity to sear...
Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-di...
Parametrically driven oscillators provide a natural platform for neuromorphic computation, where nonlinear mode coupling and intrinsic dynamics enable both memo...
Capsule endoscopy (CE) enables non-invasive gastrointestinal screening, but current CE research remains largely limited to frame-level classification and detect...
Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing a...
The capabilities of AI-assisted coding are progressing at breakneck speed. Chat-based vibe coding has evolved into fully fledged AI-assisted, agentic software d...
Prior work evaluates code generation bias primarily through simple conditional statements, which represent only a narrow slice of real-world programming and rev...
The choice of activation function plays a crucial role in the optimization and performance of deep neural networks. While the Rectified Linear Unit (ReLU) remai...
GeForce NOW is doubling down on what matters most: gamers. This week’s upgrades bring smarter libraries, making it easier than ever for gamers to turn a PC coll...
!From Rainforests to Recycling Plants: 5 Ways NVIDIA AI Is Protecting the Planethttps://blogs.nvidia.com/wp-content/uploads/2026/04/Earth-2_thumbnail-300x169.pn...
We compare lightweight automata-based models (n-grams) with neural architectures (LSTM, Transformer) for next-activity prediction in streaming event logs. Exper...
Reservoir computing (RC) is an emerging recurrent neural network architecture that has attracted growing attention for its low training cost and modest hardware...
Machine Learning (ML) Engineering is a growing field that necessitates an increase in the rigor of ML development. It draws many ideas from software engineering...
Last week, Anthropic announced Project Glasswing, an AI model so effective at discovering software vulnerabilities that they took the extraordinary step of post...
OpenAI GPT‑5.5 is a new model designed for complex, real‑world work, including writing code, researching online, analyzing information, creating documents and s...
Plugins Plugins help Codex connect to other tools and sources of information. For example, a plugin might help Codex reference files in Google Drive, scan your...
Overview Codex is an AI agent you can delegate real work to. While ChatGPT excels at asking questions, brainstorming, and drafting in conversation, Codex is bu...
Overview When you open Codex, you’ll see a few core elements: a sidebar menu, projects, settings, and a chat window. You don’t need to understand everything ri...
Overview Codex can automatically run tasks on a schedule, making it proactive. Instead of waiting for you to ask for an update, Codex can return at the schedul...
Getting Started Make Codex work the way you want, with fewer interruptions. You can access settings from the menu in the bottom‑left corner of Codex. Key Setti...
Getting Started with Codex Tips to set up Codex, create your first project, and start completing real tasks. Start by downloading the Codex desktop apphttps://...
Local Optima Networks (LONs) represent the global structure of search spaces as graphs, but their construction requires iterative execution of a search algorith...
Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views...
When we attribute a statement, a position, or a quote to a named source, that material comes from direct engagement with interviews, transcripts, published stat...
Vision Graph Neural Networks (ViGs) represent an image as a graph of patch tokens, enabling adaptive, feature-driven neighborhoods. Unlike CNNs with fixed grid ...