PCIe Hot Reset on Linux

Resets in PCI express are a bit complex. There are two main types of resets - conventional reset, and function-level reset. There are also two types of conventional resets, fundamental resets and non-fundamental resets. See the PCI express specification for all of the details.

A 'cold reset' is a fundamental reset that takes place after power is applied to a PCIe device. There appears to be no standard way of triggering a cold reset, save for turning the system off and back on again. On my machines, the /sys/bus/pci/slots directory is empty.

A 'warm reset' is a fundamental reset that is triggered without disconnecting power from the device. There appears to be no standard way of triggering a warm reset.

A 'hot reset' is a conventional reset that is triggered across a PCI express link. A hot reset is triggered either when a link is forced into electrical idle or by sending TS1 and TS2 ordered sets with the hot reset bit set. Software can initiate a hot reset by setting and then clearing the secondary bus reset bit in the bridge control register in the PCI configuration space of the bridge port upstream of the device.

A 'function-level reset' (FLR) is a reset that affects only a single function of a PCI express device. It must not reset the entire PCIe device. Implementing function-level resets is not required by the PCIe specification. A function-level reset is initiated by setting the initiate function-level reset bit in the function's device control register in the PCI express capability structure in the PCI configuration space.

Linux exposes the function-level reset functionality in the form of /sys/bus/pci/devices/$dev/reset. Writing a 1 to this file will initiate a function-level reset on the corresponding function. Note that this only affects that specific function of the device, not the whole device, and devices are not required to implement function-level resets as per the PCIe specification.

I am not aware of any 'nice' method for triggering a hot reset (there is no sysfs entry for that). However, it is possible to use setpci to do so with the following script:

pcie_hot_reset.sh
#!/bin/bash
 
dev=$1
 
if [ -z "$dev" ]; then
    echo "Error: no device specified"
    exit 1
fi
 
if [ ! -e "/sys/bus/pci/devices/$dev" ]; then
    dev="0000:$dev"
fi
 
if [ ! -e "/sys/bus/pci/devices/$dev" ]; then
    echo "Error: device $dev not found"
    exit 1
fi
 
port=$(basename $(dirname $(readlink "/sys/bus/pci/devices/$dev")))
 
if [ ! -e "/sys/bus/pci/devices/$port" ]; then
    echo "Error: device $port not found"
    exit 1
fi
 
echo "Removing $dev..."
 
echo 1 > "/sys/bus/pci/devices/$dev/remove"
 
echo "Performing hot reset of port $port..."
 
bc=$(setpci -s $port BRIDGE_CONTROL)
 
echo "Bridge control:" $bc
 
setpci -s $port BRIDGE_CONTROL=$(printf "%04x" $(("0x$bc" | 0x40)))
sleep 0.01
setpci -s $port BRIDGE_CONTROL=$bc
sleep 0.5
 
echo "Rescanning bus..."
 
echo 1 > "/sys/bus/pci/devices/$port/rescan"

Ensure that all attached drivers are unloaded before running this script. This script will attempt to remove the PCIe device, then command the upstream switch port to issue a hot reset, then attempt to rescan the PCIe bus. This script has also only been tested on devices with a single function, so it may need some reworking for devices with multiple functions.

Answer on Stack Exchange