Llama.cpp + OpenWebUI Setup on Proxmox
This document details the complete setup of a llama.cpp server with OpenWebUI interface running in a Proxmox container with HTTPS access.
Container Specifications
- Container ID: #106
- Name: llama-ai
- RAM: 8GB
- CPU: 4 cores
- Storage: 32GB (local-lvm)
- MAC Address: BC:24:11:15:F2:3A
- IP Address: 172.16.32.135
- OS: Debian 12 (Bookworm)
Services Overview
llama.cpp Server
- Port: 8080
- Model: Qwen2.5-1.5B-Instruct (Q4_0 quantization)
- Context Window: 16,384 tokens (~12,000-13,000 words)
- Service:
llama-cpp.service
- Status: Auto-start enabled
OpenWebUI
- Port: 3000
- Interface: Web-based chat interface
- Service:
open-webui.service
- Status: Auto-start enabled
- PyTorch: CPU version installed
NGINX Reverse Proxy
- HTTP Port: 80 (redirects to HTTPS)
- HTTPS Port: 443
- Domain:
https://llama-ai.<yourdomain.com>
- SSL: Let’s Encrypt with Cloudflare DNS challenge
- Auto-renewal: Enabled
Installation Steps
1. Create Proxmox Container
pct create 106 local:vztmpl/debian-12-standard_12.12-1_amd64.tar.zst \
--hostname llama-ai \
--memory 8192 \
--cores 4 \
--rootfs local-lvm:32 \
--net0 name=eth0,bridge=vmbr0,hwaddr=BC:24:11:15:F2:3A,ip=dhcp \
--unprivileged 1 \
--onboot 1
2. Install Dependencies
apt update && apt upgrade -y
apt install -y build-essential cmake git curl wget python3 python3-pip pkg-config libcurl4-openssl-dev
3. Build llama.cpp
cd /opt
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake .. && make -j$(nproc)
4. Download Models
mkdir -p /opt/llama.cpp/models
cd /opt/llama.cpp/models
# Qwen2.5 0.5B (fastest, 409MB)
wget -O qwen3-0.6b-q4_0.gguf \
https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf
# Qwen2.5 1.5B (most capable, 1017MB)
wget -O qwen2.5-1.5b-instruct-q4_0.gguf \
https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_0.gguf
# Llama 3.2 1B (balanced, 738MB)
wget -O llama3.2-1b-instruct-q4_0.gguf \
https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf
5. Create llama.cpp Service
# /etc/systemd/system/llama-cpp.service
[Unit]
Description=Llama.cpp Server
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/opt/llama.cpp
ExecStart=/opt/llama.cpp/build/bin/llama-server \
--model /opt/llama.cpp/models/qwen2.5-1.5b-q4_0.gguf \
--host 0.0.0.0 \
--port 8080 \
--ctx-size 16384
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
6. Install OpenWebUI
python3 -m venv /opt/openwebui-venv
source /opt/openwebui-venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install open-webui
7. Create OpenWebUI Service
# /etc/systemd/system/open-webui.service
[Unit]
Description=OpenWebUI
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/opt/openwebui-venv
Environment=OPENAI_API_BASE_URL=http://127.0.0.1:8080/v1
Environment=OPENAI_API_KEY=sk-dummy
Environment=WEBUI_AUTH=false
ExecStart=/opt/openwebui-venv/bin/open-webui serve --port 3000 --host 0.0.0.0
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
8. Setup HTTPS with Let’s Encrypt
Install NGINX and Certbot
apt install -y nginx python3-certbot-nginx python3-certbot-dns-cloudflare
Configure Cloudflare Credentials
mkdir -p /etc/letsencrypt/credentials
chmod 700 /etc/letsencrypt/credentials
echo "dns_cloudflare_api_token = YOUR_CLOUDFLARE_TOKEN" > /etc/letsencrypt/credentials/cloudflare.ini
chmod 600 /etc/letsencrypt/credentials/cloudflare.ini
Obtain SSL Certificate
certbot certonly \
--dns-cloudflare \
--dns-cloudflare-credentials /etc/letsencrypt/credentials/cloudflare.ini \
-d llama-ai.<yourdomain.com> \
--non-interactive \
--agree-tos \
--email [email protected]
Configure NGINX
# /etc/nginx/sites-available/llama-ai.<yourdomain.com>
server {
listen 80;
server_name llama-ai.<yourdomain.com>;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name llama-ai.<yourdomain.com>;
ssl_certificate /etc/letsencrypt/live/llama-ai.<yourdomain.com>/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llama-ai.<yourdomain.com>/privkey.pem;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
9. Enable Services
systemctl daemon-reload
systemctl enable --now llama-cpp.service
systemctl enable --now open-webui.service
systemctl enable --now nginx
ln -s /etc/nginx/sites-available/llama-ai.<yourdomain.com> /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx
Switching Models
llama.cpp runs as a systemd service with a single model loaded at start (set via --model
in llama-cpp.service
). To change models, update the service, reload, and restart.
- Check the currently active model
grep -- "--model" /etc/systemd/system/llama-cpp.service | sed 's/.*--model //; s/ --host.*//'
systemctl is-active llama-cpp
- Switch to Qwen2.5 0.5B (fastest)
systemctl stop llama-cpp
sed -i "s|--model /opt/llama.cpp/models/.*\.gguf|--model /opt/llama.cpp/models/qwen3-0.6b-q4_0.gguf|" /etc/systemd/system/llama-cpp.service
systemctl daemon-reload
systemctl start llama-cpp
- Switch to Qwen2.5 1.5B Instruct (most capable)
systemctl stop llama-cpp
sed -i "s|--model /opt/llama.cpp/models/.*\.gguf|--model /opt/llama.cpp/models/qwen2.5-1.5b-instruct-q4_0.gguf|" /etc/systemd/system/llama-cpp.service
systemctl daemon-reload
systemctl start llama-cpp
- Switch to Llama 3.2 1B Instruct (balanced)
systemctl stop llama-cpp
sed -i "s|--model /opt/llama.cpp/models/.*\.gguf|--model /opt/llama.cpp/models/llama3.2-1b-instruct-q4_0.gguf|" /etc/systemd/system/llama-cpp.service
systemctl daemon-reload
systemctl start llama-cpp
Notes:
- After switching, OpenWebUI will automatically use the new model (it connects to the same llama.cpp endpoint). Refresh the UI if needed.
- If you prefer one-click switching, create small scripts (e.g.,
/usr/local/bin/switch-to-qwen-1.5b
) that run the commands above.
Access URLs
- Primary Interface: https://llama-ai.<yourdomain.com>
- Direct OpenWebUI: http://172.16.32.135:3000
- llama.cpp API: http://172.16.32.135:8080
Key Features
- Large Context Window: 16,384 tokens for long conversations
- HTTPS Security: Valid SSL certificate with auto-renewal
- WebSocket Support: Real-time chat interface
- Auto-startup: All services start automatically on boot
- CPU Optimized: PyTorch CPU version for efficient resource usage
Model Details
Multiple models are available for switching. Default service starts with Qwen2.5 1.5B.
Qwen2.5 0.5B-Instruct (qwen3-0.6b-q4_0.gguf)
- Parameters: 630 million
- File Size: 409MB
- Speed: Fastest inference
- Use Case: Quick responses, simple tasks
- Training Context: 32,768 tokens
Qwen2.5 1.5B-Instruct (qwen2.5-1.5b-instruct-q4_0.gguf)
- Parameters: 1.78 billion
- File Size: 1017MB
- Speed: Moderate inference
- Use Case: Most capable, complex reasoning
- Training Context: 32,768 tokens
Llama 3.2 1B-Instruct (llama3.2-1b-instruct-q4_0.gguf)
- Parameters: 1.24 billion
- File Size: 738MB
- Speed: Good balance
- Use Case: Balanced performance/speed
- Training Context: 131,072 tokens
Common Features:
- Quantization: Q4_0 (4-bit) for all models
- Running Context: 16,384 tokens (can be adjusted in service config)
- Language Support: Multilingual capabilities
Resource Usage
- RAM Usage: ~1.1GB (llama.cpp) + ~500MB (OpenWebUI)
- Storage: ~2GB total (model + applications)
- CPU: Moderate load during inference
Maintenance
- Certificate Renewal: Automatic via
certbot.timer
- Service Logs:
journalctl -u service-name
- Model Switching: Use commands from “Switching Models” section above
- Model Updates: Download new GGUF files to
/opt/llama.cpp/models/
- OpenWebUI Updates:
pip install --upgrade open-webui
in venv - Check Current Model:
grep "--model" /etc/systemd/system/llama-cpp.service
Troubleshooting
Service Status
systemctl status llama-cpp.service
systemctl status open-webui.service
systemctl status nginx
Health Checks
curl http://127.0.0.1:8080/health
curl -I http://127.0.0.1:3000
curl -I https://llama-ai.<yourdomain.com>
Logs
journalctl -u llama-cpp.service -f
journalctl -u open-webui.service -f
tail -f /var/log/nginx/access.log
Setup completed: October 16, 2025 Container IP: 172.16.32.135 Public URL: https://llama-ai.<yourdomain.com>