Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge
Overview We’re excited to share Granite 4.0 1B Speechhttps://huggingface.co/ibm-granite/granite-4.0-1b-speech, the latest addition to IBM's Granite Speech coll...
Overview We’re excited to share Granite 4.0 1B Speechhttps://huggingface.co/ibm-granite/granite-4.0-1b-speech, the latest addition to IBM's Granite Speech coll...
Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a simil...
CLIP-based prompt tuning enables pretrained Vision-Language Models (VLMs) to efficiently adapt to downstream tasks. Although existing studies have made signific...
Recent advances in time-series forecasting increasingly rely on pre-trained foundation-style models. While these models often claim broad generalization, existi...
Training large language models (LLMs) as autonomous agents often begins with imitation learning, but it only teaches agents what to do without understanding why...
Large language models are increasingly used for financial analysis and investment research, yet systematic evaluation of their financial reasoning capabilities ...
Autoregressive (AR) diffusion offers a promising framework for generating videos of theoretically infinite length. However, a major challenge is maintaining tem...
!https://9to5google.com/wp-content/uploads/sites/4/2026/03/gemini-logo.jpg?quality=82&strip=all&w=1600 As AI tools continue to grow in popularity, Google’s Gemi...
The rapid advancement of artificial intelligence (AI) technologies presents both unprecedented opportunities and significant challenges for sustainable economic...
Can we find a network architecture for ML model training so as to optimize training loss (and thus, accuracy) in Split Federated Learning (SFL)? And can this ar...
Autoregressive 'language' models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leav...
We introduce structural causal bottleneck models (SCBMs), a novel class of structural causal models. At the core of SCBMs lies the assumption that causal effect...
Single-stage multi-person pose estimation aims to jointly perform human localization and keypoint prediction within a unified framework, offering advantages in ...
The celebrated Myerson--Satterthwaite theorem shows that in bilateral trade, no mechanism can be simultaneously fully efficient, Bayesian incentive compatible (...
Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisati...
We tackle the challenging task of generating complete 3D facial animations for two interacting, co-located participants from a mixed audio stream. While existin...
In the forthcoming years the LHC experiments are going to be upgraded to benefit from the substantial increase of the LHC instantaneous luminosity, which will l...
Problem We were spending $198/month on AI agent API costs. The majority of the spend wasn’t coming from the LLM calls we expected, but from a forgotten agent l...
Recent advancements in 3D Gaussian Splatting (3DGS) have shifted the focus toward balancing reconstruction fidelity with computational efficiency. In this work,...
Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards...
The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often ...
In this paper, we present a context-free unsupervised approach based on a self-conditioned GAN to learn different modes from 2D trajectories. Our intuition is t...
We introduce OfficeQA Pro, a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus...
Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chai...
As video content creation shifts toward long-form narratives, composing short clips into coherent storylines becomes increasingly important. However, prevailing...
Template-free animatable head avatars can achieve high visual fidelity by learning expression-dependent facial deformation directly from a subject's capture, av...
AI agents have become surprisingly proficient at software engineering over the past year, largely due to improvements in reasoning capabilities. This raises a d...
Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision sys...
Streaming video understanding often involves time-sensitive scenarios where models need to answer exactly when the supporting visual evidence appears: answering...
Deployed machine learning systems face distribution drift, yet most monitoring pipelines stop at alarms and leave the response underspecified under labeling, co...
!https://www.androidauthority.com/wp-content/uploads/2025/02/notebooklm-car-manual-feat.jpg TL;DR - A leak suggests Google may add options to show creator avata...
Large language models (LLMs) can answer religious knowledge queries fluently, yet they often hallucinate and misattribute sources, which is especially consequen...
Selecting an optimization algorithm requires comparing candidates across problem instances, but the computational budget for deployment is often unknown at benc...
Fighting a Powerful AI – What It Feels Like The US has been acting powerful recently and it reminded me of this question: What does it feel like to fight again...
ABB Robotics and NVIDIA today announced a breakthrough partnership that brings industrial‑grade physical AIhttps://www.nvidia.com/en-us/glossary/generative-phys...
This report documents the work of our group (named SymBa) at the ALICE 2026 workshop in Copenhagen. Inspired by the pioneering work by Nils Aall Barricelli on s...
The quadratic complexity of the attention mechanism and the substantial memory footprint of the Key-Value (KV) cache present severe computational and memory cha...
Translations often carry traces of the source language, a phenomenon known as translationese. We introduce the first freely available English-to-Swedish dataset...
Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating the...
Overview Real estate photography has been revolutionized by AI. Agents and proptech developers can now enhance listings at a fraction of the traditional cost....
Visual entity tracking is an innate cognitive ability in humans, yet it remains a critical bottleneck for Vision-Language Models (VLMs). This deficit is often o...
Understanding how structured sequence information can be represented and generalized in neural systems is key to modeling the transition from acoustic input to ...
Why Benchmarking Your AI Search Matters For nearly a decade, I’ve been asked, “How do we know if our current AI setup is optimized?” The honest answer? Lots of...
AI tools picking up and repeating your habits isn’t new. ChatGPT does it by design—it mirrors your tone, adapts to your preferences, and learns what you respond...
Aircraft engine blade maintenance relies on inspection records shared across manufacturers, airlines, maintenance organizations, and regulators. Yet current sys...
How Real Machine‑Learning Products Run at Scale in Major Tech Companies If you’re curious about how machine‑learning products actually operate in large tech or...
Accurate and interpretable mortality risk prediction in intensive care units (ICUs) remains a critical challenge due to the irregular temporal structure of elec...
The multiple-choice knapsack problem (MCKP) is a classic combinatorial optimization with wide practical applications. This paper investigates a significant yet ...