GGML and llama.cpp join HF to ensure the long-term progress of Local AI
Source: Hugging Face Blog
We are super happy to announce that GGML, creators of llama.cpp, are joining Hugging Face in order to keep future AI open. ๐ฅ
Georgi Gerganov and his team are joining Hugging Face with the goal of scaling and supporting the community behind ggml and llama.cpp as Local AI continues to make exponential progress in the coming years.
Weโve been working with Georgi and the team for quite some time (we even have awesome core contributors to llama.cpp like Son and Alek on the team already), so this has been a very natural process.
llama.cpp is the fundamental building block for local inference, and transformers is the fundamental building block for model definition, so this is basically a match made in heaven. โค๏ธ

What will change for llama.cpp, the open source project and the community?
Not much โ Georgi and the team will continue to dedicate 100โฏ% of their time to maintaining llama.cpp and retain full autonomy and leadership over its technical direction and community.
Hugging Face is providing the project with longโterm sustainable resources, improving the chances for the project to grow and thrive. The project will remain 100โฏ% openโsource and communityโdriven as it is today.
Technical focus
- Seamless integration โ We will work on making it as easy as possible (almost โsingleโclickโ) to ship new models in llama.cpp from the transformers library, which serves as the โsource of truthโ for model definitions.
- Packaging & user experience โ As local inference becomes a meaningful and competitive alternative to cloud inference, we will improve and simplify the way casual users deploy and access local models. Our goal is to make llama.cpp ubiquitous and readily available everywhere.
Our longโterm vision
Our shared goal is to provide the community with the building blocks to make openโsource superintelligence accessible to the world over the coming years.
We will achieve this together with the growing Local AI community, as we continue to build the ultimate inference stack that runs as efficiently as possible on our devices.