Proxmox GPU Passthrough, Q35 Machine Type Network Issues, and Plex Deployment

Overview

This comprehensive guide covers GPU passthrough setup in Proxmox, the network interface issues caused by switching to Q35 machine type, and the complete deployment of Plex Media Server with Intel QSV hardware transcoding on a Talos Kubernetes cluster.

Part 1: GPU Passthrough Setup

Problem

Need to grant a Proxmox VM direct access to a GPU for hardware acceleration or AI workloads.

Solution Steps

Enable IOMMU in host BIOS/UEFI
- Intel: Enable VT-d
- AMD: Enable AMD-Vi

Configure host kernel parameters

# Edit GRUB configuration
nano /etc/default/grub

# Add to GRUB_CMDLINE_LINUX_DEFAULT:
# For Intel: intel_iommu=on iommu=pt
# For AMD: amd_iommu=on iommu=pt

update-grub
reboot

Identify GPU PCI ID

lspci | grep -i vga
lspci -n | grep <device_id>

Configure VM for GPU passthrough

# Set machine type to Q35 (required for PCIe passthrough)
qm set <vmid> -machine q35

# Add GPU to VM
qm set <vmid> -hostpci0 <pci_id>,pcie=1

Part 2: Network Interface Issues After Q35 Switch

Problem Discovered

After running qm set 103 -machine q35, VMs lost network connectivity. The command that triggered the issue:

qm set 103 -machine q35

Root Cause Analysis

The Q35 machine type change caused network interface names to change due to different PCI topology:

Before (pc-i440fx): Interface named ens18
After (Q35): Interface named enp6s18

Why this happens:

Q35 chipset provides modern PCIe topology vs. legacy PCI in i440fx
Predictable network naming scheme generates names based on PCI bus location
Different PCI slots result in different interface names

Diagnosis Commands

# Inside affected VMs
ip link show
# Shows: enp6s18 state DOWN instead of expected ens18

Part 3: Solutions by VM Type

Ubuntu VMs (using netplan)

Identify the new interface:

ip link show
# Expected output shows enp6s18 in DOWN state

Update netplan configuration:

sudo nano /etc/netplan/00-installer-config.yaml

Change from:

network:
  ethernets:
    ens18:
      dhcp4: true

To:

network:
  ethernets:
    enp6s18:
      dhcp4: true

Apply configuration:

sudo netplan try    # Test configuration
sudo netplan apply  # Apply permanently

Talos Kubernetes Nodes

Challenge: Talos nodes have no shell access and are unreachable over network.

Solutions:

Via local console (if accessible):
- Access VM console through Proxmox
- Use talosctl commands locally on the node
Via configuration update:
- Update original Talos machine configs to specify enp6s18
- Reset VMs and reapply updated configurations

Reset and reinstall approach:

# Reset Talos node
qm reset <vmid>
# Boot from installer and apply corrected config

Part 4: Plex Media Server Deployment

Prerequisites

After fixing network issues, deploy Plex with Intel QSV hardware transcoding support.

Challenge: Missing GPU Drivers

Initial deployment failed because /dev/dri didn’t exist on Talos nodes - GPU drivers weren’t included in the Talos image.

Solution: Talos Upgrade

Upgraded Talos from v1.10.5 to v1.10.6 using factory image with GPU support:

# Upgrade each node with factory image containing GPU drivers
talosctl upgrade --nodes <node-ip> --image factory.talos.dev/metal-installer/d3dc673627e9b94c6cd4122289aa52c2484cddb31017ae21b75309846e257d30:v1.10.6

Plex Configuration

Node Selection

# Label the GPU node
kubectl label node n2 gpu=intel-qsv

Deployment Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: plex
  namespace: plex
spec:
  template:
    spec:
      nodeSelector:
        gpu: intel-qsv  # Ensure scheduling on GPU node
      containers:
      - name: plex
        volumeMounts:
        - name: dev-dri
          mountPath: /dev/dri  # Intel GPU device access
      volumes:
      - name: dev-dri
        hostPath:
          path: /dev/dri
          type: Directory

Namespace Security

apiVersion: v1
kind: Namespace
metadata:
  name: plex
  labels:
    # Allow privileged containers for GPU access
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged

Flux GitOps Integration

Added Plex to production cluster kustomization
Removed NGINX configuration snippets (not allowed by default)
Automated deployment via Git commits

Part 5: Verification and Results

GPU Detection

# Verify Intel GPU is detected
talosctl -n <node-ip> dmesg | grep -i intel
# Shows: Intel Alder Lake-N GPU (8086:46d1)

# Verify i915 driver loaded
talosctl -n <node-ip> read /proc/modules | grep i915
# Shows: i915 driver and dependencies loaded

# Verify GPU devices available
talosctl -n <node-ip> ls /dev/dri
# Shows: card0, renderD128

Final Status

✅ GPU Passthrough: Intel GPU successfully passed to n2 VM
✅ Talos Upgrade: All nodes upgraded to v1.10.6 with GPU support
✅ Network Fixed: All VMs networking restored after Q35 switch
✅ Plex Deployed: Running with Intel QSV hardware transcoding
✅ Flux Integration: Automated deployment pipeline working

Part 6: Prevention and Best Practices

Before Changing Machine Types:

Document current interface names
```
ip link show
```
Update network configurations proactively
- Modify configs to use new predicted interface names
- Test on non-critical VMs first
Plan GPU driver requirements
- Ensure OS/container runtime supports target hardware
- Verify driver availability before deployment

Recovery Commands

Temporary network fix (Ubuntu):

sudo ip link set enp6s18 up
sudo dhclient enp6s18

Revert machine type if needed:

qm set <vmid> -machine pc-i440fx-8.1

Check Talos GPU status:

talosctl -n <node-ip> ls /dev/dri
talosctl -n <node-ip> read /proc/modules | grep i915

Key Lessons Learned

Q35 machine type is required for GPU passthrough but changes PCI topology
Interface naming changes are predictable but must be planned for
Talos GPU support requires newer versions with appropriate factory images
Pod Security Standards must be configured for privileged GPU access
GitOps workflows can automate complex infrastructure deployments
Always test infrastructure changes on non-critical systems first
Talos factory images provide pre-built configurations with specific hardware support

Quick Reference

Common Interface Name Mappings (i440fx → Q35):

eth0 → enp0s3
ens18 → enp6s18
ens19 → enp6s19

Essential Commands:

# Check machine type
qm config <vmid> | grep machine

# Change machine type
qm set <vmid> -machine q35

# Check network interfaces
ip link show

# Apply netplan changes
sudo netplan apply

# Upgrade Talos with GPU support
talosctl upgrade --nodes <ip> --image factory.talos.dev/metal-installer/<hash>:<version>

# Verify Plex deployment
kubectl get pods -n plex -o wide

This comprehensive setup demonstrates the integration of multiple technologies: Proxmox virtualization, GPU passthrough, Talos Kubernetes, Flux GitOps, and containerized media services - all working together to provide a robust, scalable infrastructure platform.

Overview#

Part 1: GPU Passthrough Setup#

Problem#

Solution Steps#

Part 2: Network Interface Issues After Q35 Switch#

Problem Discovered#

Root Cause Analysis#

Diagnosis Commands#

Part 3: Solutions by VM Type#

Ubuntu VMs (using netplan)#

Talos Kubernetes Nodes#

Part 4: Plex Media Server Deployment#

Prerequisites#

Challenge: Missing GPU Drivers#

Solution: Talos Upgrade#

Plex Configuration#

Flux GitOps Integration#

Part 5: Verification and Results#

GPU Detection#

Final Status#

Part 6: Prevention and Best Practices#

Before Changing Machine Types:#

Recovery Commands#

Key Lessons Learned#

Quick Reference#

Common Interface Name Mappings (i440fx → Q35):#

Essential Commands:#