Overview
This comprehensive guide covers GPU passthrough setup in Proxmox, the network interface issues caused by switching to Q35 machine type, and the complete deployment of Plex Media Server with Intel QSV hardware transcoding on a Talos Kubernetes cluster.
Part 1: GPU Passthrough Setup
Problem
Need to grant a Proxmox VM direct access to a GPU for hardware acceleration or AI workloads.
Solution Steps
Enable IOMMU in host BIOS/UEFI
- Intel: Enable VT-d
- AMD: Enable AMD-Vi
Configure host kernel parameters
# Edit GRUB configuration nano /etc/default/grub # Add to GRUB_CMDLINE_LINUX_DEFAULT: # For Intel: intel_iommu=on iommu=pt # For AMD: amd_iommu=on iommu=pt update-grub reboot
Identify GPU PCI ID
lspci | grep -i vga lspci -n | grep <device_id>
Configure VM for GPU passthrough
# Set machine type to Q35 (required for PCIe passthrough) qm set <vmid> -machine q35 # Add GPU to VM qm set <vmid> -hostpci0 <pci_id>,pcie=1
Part 2: Network Interface Issues After Q35 Switch
Problem Discovered
After running qm set 103 -machine q35
, VMs lost network connectivity. The command that triggered the issue:
qm set 103 -machine q35
Root Cause Analysis
The Q35 machine type change caused network interface names to change due to different PCI topology:
- Before (pc-i440fx): Interface named
ens18
- After (Q35): Interface named
enp6s18
Why this happens:
- Q35 chipset provides modern PCIe topology vs. legacy PCI in i440fx
- Predictable network naming scheme generates names based on PCI bus location
- Different PCI slots result in different interface names
Diagnosis Commands
# Inside affected VMs
ip link show
# Shows: enp6s18 state DOWN instead of expected ens18
Part 3: Solutions by VM Type
Ubuntu VMs (using netplan)
Identify the new interface:
ip link show # Expected output shows enp6s18 in DOWN state
Update netplan configuration:
sudo nano /etc/netplan/00-installer-config.yaml
Change from:
network: ethernets: ens18: dhcp4: true
To:
network: ethernets: enp6s18: dhcp4: true
Apply configuration:
sudo netplan try # Test configuration sudo netplan apply # Apply permanently
Talos Kubernetes Nodes
Challenge: Talos nodes have no shell access and are unreachable over network.
Solutions:
Via local console (if accessible):
- Access VM console through Proxmox
- Use talosctl commands locally on the node
Via configuration update:
- Update original Talos machine configs to specify
enp6s18
- Reset VMs and reapply updated configurations
- Update original Talos machine configs to specify
Reset and reinstall approach:
# Reset Talos node qm reset <vmid> # Boot from installer and apply corrected config
Part 4: Plex Media Server Deployment
Prerequisites
After fixing network issues, deploy Plex with Intel QSV hardware transcoding support.
Challenge: Missing GPU Drivers
Initial deployment failed because /dev/dri
didn’t exist on Talos nodes - GPU drivers weren’t included in the Talos image.
Solution: Talos Upgrade
Upgraded Talos from v1.10.5 to v1.10.6 using factory image with GPU support:
# Upgrade each node with factory image containing GPU drivers
talosctl upgrade --nodes <node-ip> --image factory.talos.dev/metal-installer/d3dc673627e9b94c6cd4122289aa52c2484cddb31017ae21b75309846e257d30:v1.10.6
Plex Configuration
Node Selection
# Label the GPU node kubectl label node n2 gpu=intel-qsv
Deployment Configuration
apiVersion: apps/v1 kind: Deployment metadata: name: plex namespace: plex spec: template: spec: nodeSelector: gpu: intel-qsv # Ensure scheduling on GPU node containers: - name: plex volumeMounts: - name: dev-dri mountPath: /dev/dri # Intel GPU device access volumes: - name: dev-dri hostPath: path: /dev/dri type: Directory
Namespace Security
apiVersion: v1 kind: Namespace metadata: name: plex labels: # Allow privileged containers for GPU access pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged
Flux GitOps Integration
- Added Plex to production cluster kustomization
- Removed NGINX configuration snippets (not allowed by default)
- Automated deployment via Git commits
Part 5: Verification and Results
GPU Detection
# Verify Intel GPU is detected
talosctl -n <node-ip> dmesg | grep -i intel
# Shows: Intel Alder Lake-N GPU (8086:46d1)
# Verify i915 driver loaded
talosctl -n <node-ip> read /proc/modules | grep i915
# Shows: i915 driver and dependencies loaded
# Verify GPU devices available
talosctl -n <node-ip> ls /dev/dri
# Shows: card0, renderD128
Final Status
✅ GPU Passthrough: Intel GPU successfully passed to n2 VM
✅ Talos Upgrade: All nodes upgraded to v1.10.6 with GPU support
✅ Network Fixed: All VMs networking restored after Q35 switch
✅ Plex Deployed: Running with Intel QSV hardware transcoding
✅ Flux Integration: Automated deployment pipeline working
Part 6: Prevention and Best Practices
Before Changing Machine Types:
Document current interface names
ip link show
Update network configurations proactively
- Modify configs to use new predicted interface names
- Test on non-critical VMs first
Plan GPU driver requirements
- Ensure OS/container runtime supports target hardware
- Verify driver availability before deployment
Recovery Commands
Temporary network fix (Ubuntu):
sudo ip link set enp6s18 up
sudo dhclient enp6s18
Revert machine type if needed:
qm set <vmid> -machine pc-i440fx-8.1
Check Talos GPU status:
talosctl -n <node-ip> ls /dev/dri
talosctl -n <node-ip> read /proc/modules | grep i915
Key Lessons Learned
- Q35 machine type is required for GPU passthrough but changes PCI topology
- Interface naming changes are predictable but must be planned for
- Talos GPU support requires newer versions with appropriate factory images
- Pod Security Standards must be configured for privileged GPU access
- GitOps workflows can automate complex infrastructure deployments
- Always test infrastructure changes on non-critical systems first
- Talos factory images provide pre-built configurations with specific hardware support
Quick Reference
Common Interface Name Mappings (i440fx → Q35):
eth0
→enp0s3
ens18
→enp6s18
ens19
→enp6s19
Essential Commands:
# Check machine type
qm config <vmid> | grep machine
# Change machine type
qm set <vmid> -machine q35
# Check network interfaces
ip link show
# Apply netplan changes
sudo netplan apply
# Upgrade Talos with GPU support
talosctl upgrade --nodes <ip> --image factory.talos.dev/metal-installer/<hash>:<version>
# Verify Plex deployment
kubectl get pods -n plex -o wide
This comprehensive setup demonstrates the integration of multiple technologies: Proxmox virtualization, GPU passthrough, Talos Kubernetes, Flux GitOps, and containerized media services - all working together to provide a robust, scalable infrastructure platform.