Bad network performance after suspend/resume
I am observing several issues with suspend/resume on my Lenovo Thinkpad L15 G2 (Intel) running Ubuntu 21.04 (currently with kernel 5.11.0-25). One of these issues is that, after the machine wakes up from suspend, the wired network speed was very slow. I used the Vodafone speed check (speedtest.vodafone.de) which indeed showed only a download rate between 2 and 4 MBit/sec. (which really should be > 300 MBit/sec). In this article I describe some of the steps I took to further track this down.
First of all, I made sure that the issue is really reproducible in a simple scenario. And indeed, after reboot the network speed was fine until I did a suspend/resume cycle. After wakeup, the network speed was bad, and the only immediate solution was to reboot, which fixed the network speed until after the next wakeup.
In the next step, I tried bringing the network down and up again. For that, we first need to find out the interface name of the wired network device:
$ ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 38:f3:ab:48:8e:2b brd ff:ff:ff:ff:ff:ff 3: wlp9s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000 link/ether c0:3c:59:29:40:5f brd ff:ff:ff:ff:ff:ff
Here enp0s31f6
is the wired LAN interface. Now we can bounce the network interface (note that this requires root access):
# ip link set dev enp0s31f6 down ; ip link set dev enp0s31f6 up
However, this did not fix the issue. The network speed was still slow.
The next try was to reload the module which contains the device driver for the network device. In order to do this, I first had to check which network controller is actually used in the laptop – a simply way is to search the list of PCI devices:
# lspci | grep -i eth 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (13) I219-V (rev 20)
So, the network controller is an Intel I219-V. A quick google search (“Intel I219-V linux driver”) revealed that the kernel module is called e1000e
. Alternatively, the hwinfo
command can be used which directly shows the driver which is used for a particular device:
# hwinfo ... 88: None 00.0: 10701 Ethernet [Created at net.126] Unique ID: 23b5.ndpeucax6V1 Parent ID: AhzA.zr6+Yth08U0 SysFS ID: /class/net/enp0s31f6 SysFS Device Link: /devices/pci0000:00/0000:00:1f.6 Hardware Class: network interface Model: "Ethernet network interface" Driver: "e1000e" Driver Modules: "e1000e" Device File: enp0s31f6 HW Address: 38:f3:ab:48:8e:2b Permanent HW Address: 38:f3:ab:48:8e:2b Link detected: yes Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #34 (Ethernet controller)
Now I was able to reload the module:
# lsmod | grep e1000e e1000e 270336 0 # rmmod e1000e # lsmod | grep e1000e # modprobe e1000e
It is always a good idea to double check if a particular command has the expected result – in this case that the module has really been unloaded.
Unfortunately, this also did not fix the speed issue.
Now, I decided to dig further into the network infrastructure. Since the Vodafone speed test is not always reliable (in fact they had another outage exactly at that day when I was investigating this speed issue), I first have set up a local environment where I could reproduce the issue without relying on any Internet connection. I have another computer available in my local network, so I used iperf to do some performance checks. I started an iperf server on the other node:
otherNode$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------
and used the iperf client on the laptop to start the measurement:
thinkpad$ iperf -c otherNode ------------------------------------------------------------ Client connecting to otherNode, TCP port 5001 TCP window size: 620 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.56 port 35608 connected with 192.168.1.57 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec
This looks really good – and, after suspend/resume, I still got the same values!! This certainly means that the issue is in another layer of the network stack, right? Probably it is related to http only, or probably it is even a browser related issue …
But, no – the explanation was much simpler: by default, iperf -s
(the server) is the data sink, and the client (iperf -c
) is the side which generates and sends data. Hence, what I was testing with the above commands, was the upload speed and not the download speed (and, the upload speed was still fine even in the aforementioned speed test). So, in order to check the download speed I had to switch the roles of client and server. After starting iperf -s
on the laptop and using iperf -c
on the other node, I could immediately reproduce the issue. It now became clear that this is most likely related to the network interface / network controller which obviously fails to properly wakeup after the resume.
With that information, i now did another google search and finally found this issue: Massive network problems with I219-V on Thinkpad T14 Gen2 (e1000e). Besides some other issues (which I did NOT see on my machine) there were also mentioned some hibernation related issues. Luckily there was also a workaround posted which indicated that this might be related to the Intel Management Engine Interface, and they suggested to change a particular setting in the MEI:
$ lspci -vvnns 16 # Check that config 16 is the number of MEI device 00:16.0 Communication controller [0780]: Intel Corporation Tiger Lake-LP Management Engine Interface [8086:a0e0] (rev 20) Subsystem: Lenovo Tiger Lake-LP Management Engine Interface [17aa:508f] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 147 IOMMU group: 8 Region 0: Memory at 601d189000 (64-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: mei_me Kernel modules: mei_me # echo on > /sys/bus/pci/devices/0000\:00\:16.0/power/control
And, indeed – this setting fixed the issue! So by setting the “power/control” property to “on” in the MEI, the issue was gone – the network speed is now still fine after suspend and wakeup.