L

Llama.cpp

Highly optimized LLM inference engine in pure C++

About Llama.cpp

Llama.cpp is a highly optimized inference engine for running Llama-family and other LLMs in pure C++ with minimal dependencies. Enables fast inference on CPUs via quantization, powers many local AI tools under the hood, and supports GPU offloading.

Pros

  • Extremely efficient
  • CPU and GPU support
  • Powers many other tools

Cons

  • Command-line focused
  • Setup requires technical knowledge

Related Tools