AI research — Page 2

Sort:

2 weeks ago · ai · - · -

Google’s new Gemini Pro model has record benchmark scores — again

!Google Geminihttps://techcrunch.com/wp-content/uploads/2026/01/google-gemini-jagmeet-singh-techcrunch.jpg?w=1024 Image Credits: Jagmeet Singh / TechCrunch In B...

#Google #Gemini #LLM #large language model #benchmark scores #AI research #machine learning
2 weeks ago · ai · - · -

Study: Self-generated Agent Skills are useless

Authors: Xiangyi Lihttps://arxiv.org/search/cs?searchtype=author&query=Li,+X, Wenbo Chenhttps://arxiv.org/search/cs?searchtype=author&query=Chen,+W, Yimin Liuht...

#self-generated skills #agent-based learning #skill discovery #reinforcement learning #AI research #arxiv
2 weeks ago · ai · - · -

🦄 Peter Steinberger (creator of OpenClaw) is joining OpenAI to help build the next generation of personal agents. OpenClaw will move into a foundation and stay open-source, with continued support.

!pichttps://media2.dev.to/dynamic/image/width=256,height=,fit=scale-down,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farti...

#OpenAI #personal agents #OpenClaw #open-source #AI research
2 weeks ago · ai · - · -

Linear Representations and Superposition

As LLMs become larger, more capable, and more ubiquitous, the field of mechanistic interpretabilityhttps://en.wikipedia.org/wiki/Mechanistic_interpretability—th...

#mechanistic interpretability #linear representation hypothesis #superposition #LLM #transformer circuits #AI research
2 weeks ago · ai · - · -

Lost in the Middle: Why Bigger Context Windows Don’t Always Improve LLM Performance

Overview Putting everything into one long prompt and hoping it works is a common practice, but it often backfires. Adding more context can actually degrade the...

#LLM #context windows #prompt engineering #AI research #long context performance
3 weeks ago · ai · - · -

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

Dynamic Memory Sparsification DMS Researchers at NVIDIA have introduced Dynamic Memory Sparsification DMS, a technique that can cut the memory cost of large‑la...

#Nvidia #large language models #dynamic memory sparsification #KV cache compression #LLM reasoning efficiency #memory optimization #AI research
3 weeks ago · ai · - · -

[Paper] Agentic Test-Time Scaling for WebAgents

Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step ...

#test-time scaling #web agents #LLM uncertainty #resource allocation #AI research
3 weeks ago · ai · - · -

GLM-5: From Vibe Coding to Agentic Engineering

Article URL: https://z.ai/blog/glm-5 Comments URL: https://news.ycombinator.com/item?id=46977210 Points: 94 Comments: 34...

#GLM-5 #large-language-models #agentic-engineering #vibe-coding #AI-research
3 weeks ago · ai · - · -

[Paper] FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Agents powered by large language models (LLMs) are increasingly adopted in the software industry, contributing code as collaborators or even autonomous develope...

#LLM coding agents #benchmark #software feature development #evaluation #AI research
3 weeks ago · ai · - · -

What is RAG? Retrieval-Augmented Generation Explained

TL;DR RAG Retrieval‑Augmented Generation combines language models with real‑time data retrieval to provide accurate, up‑to‑date responses. Key benefit: reduces...

#retrieval-augmented generation #RAG #large language models #LLM #hallucination reduction #knowledge retrieval #AI research
3 weeks ago · ai · - · -

The Machine Learning Lessons I’ve Learned Last Month

as the years before: fireworks across the globe. People greeted the new year with new resolutions and new goals. Someone, somewhere, surely said: “2026 is going...

#machine learning #ICML #research productivity #deadline season #flow state #data science #AI research
3 weeks ago · ai · - · -

[Paper] Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers

Confidence calibration is essential for making large language models (LLMs) reliable, yet existing training-free methods have been primarily studied under singl...

#LLM confidence calibration #multiple-answer evaluation #MACE benchmark #semantic confidence aggregation #AI research

Newer posts

Older posts