Google’s Cloud AI leads on the three frontiers of model capability
As a product VP at Google Cloud, Michael Gerstenhaber works mostly on Vertex AI, the company’s unified platform for deploying enterprise AI. It gives him a high...
As a product VP at Google Cloud, Michael Gerstenhaber works mostly on Vertex AI, the company’s unified platform for deploying enterprise AI. It gives him a high...
Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too hea...
We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstructio...
Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence i...
Current feed-forward 3D/4D reconstruction systems rely on dense geometry and pose supervision -- expensive to obtain at scale and particularly scarce for dynami...
LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applicatio...
We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncerta...
Inspired by behavioral science, we propose Behavior Learning (BL), a novel general-purpose machine learning framework that learns interpretable and identifiable...
Conformal risk control is an extension of conformal prediction for controlling risk functions beyond miscoverage. The original algorithm controls the expected v...
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods stru...
We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics inte...
Mean Field Games (MFGs) provide a principled framework for modeling interactions in large population models: at scale, population dynamics become deterministic,...
Data visualization rules-derived from decades of research in design and perception-ensure trustworthy chart communication. While prior work has shown that large...
With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these ...
Epidemiological models increasingly rely on self-reported behavioral data such as vaccination status, mask usage, and social distancing adherence to forecast di...
The paradigm of automated program generation is shifting from one-shot generation to inference-time search, where Large Language Models (LLMs) function as seman...
Current reinforcement learning objectives for large-model reasoning primarily focus on maximizing expected rewards. This paradigm can lead to overfitting to dom...
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining...
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a...
How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a 'black box' -- unknown ...
Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) a...
Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs) by leveraging supervisio...
Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, r...
We study online learning in the adversarial injection model introduced by [Goel et al. 2017], where a stream of labeled examples is predominantly drawn i.i.d. f...
The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supe...
Overview U.S. artificial‑intelligence startup Anthropic said three Chinese AI companies set up more than 24,000 fraudulent accounts with its Claude AI model to...
BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 4th Ba...
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retriev...
Edge-based representations are fundamental cues for visual understanding, a principle rooted in early vision research and still central today. We extend this pr...
The challenge of wrangling a deep learning model is often understanding why it does what it does: whether it’s xAI’s repeated struggle sessions to fine‑tune Gro...
Large Language Models (LLMs) play a critical role in how humans access information. While their core use relies on comprehending written requests, our understan...
In this study, the output of large language models (LLM) is considered an information source generating an unlimited sequence of symbols drawn from a finite alp...
- Video: AI is destroying open source, and it's not even good yethttps://www.youtube.com/watch?v=bZJ7A1QoUEI - Discussion: Hacker News commentshttps://news.ycom...
Multi‑Token Prediction MTP — Boosting Throughput for Agentic AI Workflows As agentic AI workflows multiply the cost and latency of long reasoning chains, a tea...
Modern code intelligence agents operate in contexts exceeding 1 million tokens--far beyond the scale where humans manually locate relevant files. Yet agents con...
Large language models are being deployed in complex socio-technical systems, which exposes limits in current alignment practice. We take the position that the d...
After initially testing its AI‑powered “Prompted Playlist” feature in New Zealandhttps://techcrunch.com/2025/12/10/spotify-tests-more-personalized-ai-powered-pr...
Large language models (LLMs) offer substantial promise for automating clinical text summarization, yet maintaining factual consistency remains challenging due t...
Climate scientists trying to predict how much hotter the planet will get have long grappled with a surprisingly stubborn problem—clouds, which both reflect sunl...
The Growing Cybersecurity Challenge for OT & ICS As technologies become more digitalized and globally connected, operational technology OT environments and ind...
Progress towards reliable deepfake labelling tech is sluggish, despite all the “help” from AI providers. Image: Cath Virginia / The Verge, Getty Images As 2025...
Progress towards reliable deepfake labelling tech is sluggish, despite all the “help” from AI providers. — Image: Cath Virginia / The Verge, Getty Images As 202...
Anthropic points its most advanced AI model, Claude Opus 4.6, at production open‑source codebases and finds a plethora of security holes: more than 500 high‑sev...
Legal developments - A U.S. court last year found that Anthropic’s training of large language models on some copyrighted content could be considered fair use b...
!Image Credits: Getty Images/Alexander Spatarihttps://techcrunch.com/wp-content/uploads/2020/04/GettyImages-1167037043-1.jpg?w=1024 !Russell Brandomhttps://tech...
By Okafor Ogbonna Pascal Losing a dear person is never easy. But losing someone to a condition that could have been caught early — that stays with you different...
Background Defense Secretary Pete Hegseth is calling in Anthropic CEO Dario Amodei to the Pentagon on Tuesday morning to discuss the military use of Claude, ac...
The narrative that AI spending has been singlehandedly propping up the U.S. economy—a claim that captivated Silicon Valley, Wall Street, and Washington over the...