L
Llama.cpp
Highly optimized LLM inference engine in pure C++
About Llama.cpp
Llama.cpp is a highly optimized inference engine for running Llama-family and other LLMs in pure C++ with minimal dependencies. Enables fast inference on CPUs via quantization, powers many local AI tools under the hood, and supports GPU offloading.
Pros
- Extremely efficient
- CPU and GPU support
- Powers many other tools
Cons
- Command-line focused
- Setup requires technical knowledge