Overview

This comprehensive guide covers GPU passthrough setup in Proxmox, the network interface issues caused by switching to Q35 machine type, and the complete deployment of Plex Media Server with Intel QSV hardware transcoding on a Talos Kubernetes cluster.

Part 1: GPU Passthrough Setup

Problem

Need to grant a Proxmox VM direct access to a GPU for hardware acceleration or AI workloads.

Solution Steps

  1. Enable IOMMU in host BIOS/UEFI

    • Intel: Enable VT-d
    • AMD: Enable AMD-Vi
  2. Configure host kernel parameters

    # Edit GRUB configuration
    nano /etc/default/grub
    
    # Add to GRUB_CMDLINE_LINUX_DEFAULT:
    # For Intel: intel_iommu=on iommu=pt
    # For AMD: amd_iommu=on iommu=pt
    
    update-grub
    reboot
    
  3. Identify GPU PCI ID

    lspci | grep -i vga
    lspci -n | grep <device_id>
    
  4. Configure VM for GPU passthrough

    # Set machine type to Q35 (required for PCIe passthrough)
    qm set <vmid> -machine q35
    
    # Add GPU to VM
    qm set <vmid> -hostpci0 <pci_id>,pcie=1
    

Part 2: Network Interface Issues After Q35 Switch

Problem Discovered

After running qm set 103 -machine q35, VMs lost network connectivity. The command that triggered the issue:

qm set 103 -machine q35

Root Cause Analysis

The Q35 machine type change caused network interface names to change due to different PCI topology:

  • Before (pc-i440fx): Interface named ens18
  • After (Q35): Interface named enp6s18

Why this happens:

  • Q35 chipset provides modern PCIe topology vs. legacy PCI in i440fx
  • Predictable network naming scheme generates names based on PCI bus location
  • Different PCI slots result in different interface names

Diagnosis Commands

# Inside affected VMs
ip link show
# Shows: enp6s18 state DOWN instead of expected ens18

Part 3: Solutions by VM Type

Ubuntu VMs (using netplan)

  1. Identify the new interface:

    ip link show
    # Expected output shows enp6s18 in DOWN state
    
  2. Update netplan configuration:

    sudo nano /etc/netplan/00-installer-config.yaml
    

    Change from:

    network:
      ethernets:
        ens18:
          dhcp4: true
    

    To:

    network:
      ethernets:
        enp6s18:
          dhcp4: true
    
  3. Apply configuration:

    sudo netplan try    # Test configuration
    sudo netplan apply  # Apply permanently
    

Talos Kubernetes Nodes

Challenge: Talos nodes have no shell access and are unreachable over network.

Solutions:

  1. Via local console (if accessible):

    • Access VM console through Proxmox
    • Use talosctl commands locally on the node
  2. Via configuration update:

    • Update original Talos machine configs to specify enp6s18
    • Reset VMs and reapply updated configurations
  3. Reset and reinstall approach:

    # Reset Talos node
    qm reset <vmid>
    # Boot from installer and apply corrected config
    

Part 4: Plex Media Server Deployment

Prerequisites

After fixing network issues, deploy Plex with Intel QSV hardware transcoding support.

Challenge: Missing GPU Drivers

Initial deployment failed because /dev/dri didn’t exist on Talos nodes - GPU drivers weren’t included in the Talos image.

Solution: Talos Upgrade

Upgraded Talos from v1.10.5 to v1.10.6 using factory image with GPU support:

# Upgrade each node with factory image containing GPU drivers
talosctl upgrade --nodes <node-ip> --image factory.talos.dev/metal-installer/d3dc673627e9b94c6cd4122289aa52c2484cddb31017ae21b75309846e257d30:v1.10.6

Plex Configuration

  1. Node Selection

    # Label the GPU node
    kubectl label node n2 gpu=intel-qsv
    
  2. Deployment Configuration

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: plex
      namespace: plex
    spec:
      template:
        spec:
          nodeSelector:
            gpu: intel-qsv  # Ensure scheduling on GPU node
          containers:
          - name: plex
            volumeMounts:
            - name: dev-dri
              mountPath: /dev/dri  # Intel GPU device access
          volumes:
          - name: dev-dri
            hostPath:
              path: /dev/dri
              type: Directory
    
  3. Namespace Security

    apiVersion: v1
    kind: Namespace
    metadata:
      name: plex
      labels:
        # Allow privileged containers for GPU access
        pod-security.kubernetes.io/enforce: privileged
        pod-security.kubernetes.io/audit: privileged
        pod-security.kubernetes.io/warn: privileged
    

Flux GitOps Integration

  • Added Plex to production cluster kustomization
  • Removed NGINX configuration snippets (not allowed by default)
  • Automated deployment via Git commits

Part 5: Verification and Results

GPU Detection

# Verify Intel GPU is detected
talosctl -n <node-ip> dmesg | grep -i intel
# Shows: Intel Alder Lake-N GPU (8086:46d1)

# Verify i915 driver loaded
talosctl -n <node-ip> read /proc/modules | grep i915
# Shows: i915 driver and dependencies loaded

# Verify GPU devices available
talosctl -n <node-ip> ls /dev/dri
# Shows: card0, renderD128

Final Status

GPU Passthrough: Intel GPU successfully passed to n2 VM
Talos Upgrade: All nodes upgraded to v1.10.6 with GPU support
Network Fixed: All VMs networking restored after Q35 switch
Plex Deployed: Running with Intel QSV hardware transcoding
Flux Integration: Automated deployment pipeline working

Part 6: Prevention and Best Practices

Before Changing Machine Types:

  1. Document current interface names

    ip link show
    
  2. Update network configurations proactively

    • Modify configs to use new predicted interface names
    • Test on non-critical VMs first
  3. Plan GPU driver requirements

    • Ensure OS/container runtime supports target hardware
    • Verify driver availability before deployment

Recovery Commands

Temporary network fix (Ubuntu):

sudo ip link set enp6s18 up
sudo dhclient enp6s18

Revert machine type if needed:

qm set <vmid> -machine pc-i440fx-8.1

Check Talos GPU status:

talosctl -n <node-ip> ls /dev/dri
talosctl -n <node-ip> read /proc/modules | grep i915

Key Lessons Learned

  1. Q35 machine type is required for GPU passthrough but changes PCI topology
  2. Interface naming changes are predictable but must be planned for
  3. Talos GPU support requires newer versions with appropriate factory images
  4. Pod Security Standards must be configured for privileged GPU access
  5. GitOps workflows can automate complex infrastructure deployments
  6. Always test infrastructure changes on non-critical systems first
  7. Talos factory images provide pre-built configurations with specific hardware support

Quick Reference

Common Interface Name Mappings (i440fx → Q35):

  • eth0enp0s3
  • ens18enp6s18
  • ens19enp6s19

Essential Commands:

# Check machine type
qm config <vmid> | grep machine

# Change machine type
qm set <vmid> -machine q35

# Check network interfaces
ip link show

# Apply netplan changes
sudo netplan apply

# Upgrade Talos with GPU support
talosctl upgrade --nodes <ip> --image factory.talos.dev/metal-installer/<hash>:<version>

# Verify Plex deployment
kubectl get pods -n plex -o wide

This comprehensive setup demonstrates the integration of multiple technologies: Proxmox virtualization, GPU passthrough, Talos Kubernetes, Flux GitOps, and containerized media services - all working together to provide a robust, scalable infrastructure platform.