llama.cpp Introduction
Inference Meta's LLaMA model (and others) using pure C/C++. The primary goal llama.cpp to enable LLM inference on various hardware (on-premises and in the cloud) with minimal setup and state-of-the-art performance.
- Pure C/C++ implementation with no dependencies
- Apple silicon is top-notch – optimized with ARM NEON, Accelerate, and Metal frameworks
- AVX, AVX2, AVX512, and AMX support x86 architectures
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
- Custom CUDA cores for running LLMs on NVIDIA GPUs (AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA)
- Vulkan and SYCL backend support
- CPU+GPU hybrid inference, partially accelerating models larger than the total VRAM capacity
Github address:The hyperlink login is visible. Download Address:The hyperlink login is visible.
Download llama.cpp
First, download the corresponding version of the llama.cpp software according to your computer's hardware configuration, as shown in the figure below:
AVX supports 256-bit wide operation. AVX2 also supports 256-bit wide operations, but adds support for integer operations as well as some additional instructions. The AVX-512 supports 512-bit wide operations, providing increased parallelism and performance, especially when dealing with large amounts of data or floating-point operations.
My computer runs on pure CPU and supports avx512 instruction set, so download the "" version, download address:The hyperlink login is visible.After the download is completed, unzip it toD:\llama-b4658-bin-win-avx512-x64Directory.
Download the DeepSeek-R1 model
Download Address:The hyperlink login is visible.This article begins with "DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.ggufFor example.
Just download it according to your own configuration. The higher the level of quantization, the larger the file, and the higher the accuracy of the model.
llama.cpp Deploy the DeepSeek-R1 model
Run the following command in the DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.gguf file directory:
As shown below:
Open it using a browserhttp://127.0.0.1:8080/The address is tested as shown below:
Attached is the running parameter configuration:The hyperlink login is visible.
|