Setting up llama.cpp in LXC Container on Proxmox

Setting up llama.cpp in LXC Container on Proxmox This guide documents the complete process of setting up llama.cpp in an LXC container on Proxmox with Intel GPU support and OpenAI-compatible API endpoints. Overview Goal: Replace Ollama with llama.cpp for better performance and lower resource usage Hardware: Intel N150 GPU (OpenCL support) Container: Debian 12 LXC on Proxmox API: OpenAI-compatible endpoints on port 11434 Container Creation 1. Create LXC Container # Download Debian 12 template pveam download local debian-12-standard_12.12-1_amd64.tar.zst # Create container pct create 107 local:vztmpl/debian-12-standard_12.12-1_amd64.tar.zst \ --hostname llama-cpp \ --memory 8192 \ --swap 512 \ --cores 4 \ --rootfs local-lvm:32 \ --net0 name=eth0,bridge=vmbr0,ip=dhcp \ --features keyctl=1,nesting=1 \ --unprivileged 1 \ --onboot 1 \ --tags ai 2. Add GPU Passthrough # Get GPU group IDs stat -c '%g' /dev/dri/card0 # Output: 44 stat -c '%g' /dev/dri/renderD128 # Output: 104 # Add GPU devices to container pct set 107 --dev0 /dev/dri/card0,gid=44 --dev1 /dev/dri/renderD128,gid=104 # Start container pct start 107 Software Installation 3. Install Dependencies # Update package list pct exec 107 -- apt update # Install build tools and dependencies pct exec 107 -- apt install -y \ build-essential \ cmake \ git \ curl \ pkg-config \ libssl-dev \ python3 \ python3-pip \ libcurl4-openssl-dev # Install OpenCL support for Intel GPU pct exec 107 -- apt install -y \ opencl-headers \ ocl-icd-opencl-dev \ intel-opencl-icd 4. Compile llama.cpp # Clone repository pct exec 107 -- bash -c "cd /opt && git clone https://github.com/ggerganov/llama.cpp.git" # Configure with OpenCL support pct exec 107 -- bash -c "cd /opt/llama.cpp && mkdir build && cd build && cmake .. -DGGML_OPENCL=ON -DCMAKE_BUILD_TYPE=Release" # Compile server binary pct exec 107 -- bash -c "cd /opt/llama.cpp/build && make -j$(nproc) llama-server" Model Setup 5. Download Models # Create models directory pct exec 107 -- mkdir -p /opt/llama.cpp/models # Download Qwen2.5-1.5B model (Q4_0 quantized) pct exec 107 -- bash -c "cd /opt/llama.cpp/models && curl -L -o qwen2.5-1.5b-q4_0.gguf https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_0.gguf" Service Configuration 6. Create systemd Service # Create service file pct exec 107 -- bash -c "printf '[Unit]\nDescription=llama.cpp Server\nAfter=network-online.target\nWants=network-online.target\n\n[Service]\nType=simple\nExecStart=/opt/llama.cpp/build/bin/llama-server --host 0.0.0.0 --port 11434 --threads 4 --model /opt/llama.cpp/models/qwen2.5-1.5b-q4_0.gguf --ctx-size 8192 --batch-size 512\nRestart=always\nRestartSec=3\nUser=root\nGroup=root\nEnvironment=HOME=/root\nStandardOutput=journal\nStandardError=journal\nSyslogIdentifier=llama-cpp\n\n[Install]\nWantedBy=multi-user.target\n' > /etc/systemd/system/llama-cpp.service" # Enable and start service pct exec 107 -- systemctl daemon-reload pct exec 107 -- systemctl enable llama-cpp.service pct exec 107 -- systemctl start llama-cpp.service Testing and Verification 7. Verify Service Status # Check service status pct exec 107 -- systemctl status llama-cpp.service # Check port binding pct exec 107 -- ss -tlnp | grep :11434 8. Test API # Test OpenAI-compatible API pct exec 107 -- curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"qwen2.5-1.5b-q4_0","messages":[{"role":"user","content":"Hello, how are you?"}],"max_tokens":50}' Expected response: ...

October 15, 2025 · 4 min · 798 words · Dmitry Konovalov

Ollama LXC Setup with GPU Acceleration and Web Interface

This guide covers setting up Ollama (Open Large Language Model) in a Proxmox LXC container with GPU passthrough and creating a simple web interface for easy interaction. Overview We’ll deploy Ollama in a resource-constrained LXC environment with: Intel UHD Graphics GPU acceleration llama3.2:1b model (~1.3GB) Lightweight Python web interface Auto-starting services Prerequisites Proxmox VE host Intel integrated graphics (UHD Graphics) At least 4GB RAM allocated to LXC 40GB+ storage for container Step 1: Container Setup Check Available GPU Resources First, verify GPU availability on the Proxmox host: ...

August 11, 2025 · 8 min · 1502 words · Dmitry Konovalov

Configuring GPU Hardware Transcoding for Plex in Proxmox LXC Container

Complete guide to enable Intel GPU hardware transcoding for Plex Media Server running in a Proxmox LXC container using VA-API.

August 9, 2025 · 5 min · 1055 words · Dmitry Konovalov