ESXi 6.5 Update 1 PSOD on HPE 460c Gen9 after Ixgben driver update

Today I upgraded a customer to ESXi 6.5 Update 1, but unfortunately some of them ended up purple screening at reboot after they were updated.

Affected Servers so far

  • HPE BL460c Gen9
  • HPE DL360p Gen8 (Reported by anonymous user)
  • HPE DL380 Gen9 (Reported by Bernhard)
  • HPE DL380 Gen8 (Reported by Ralf)
  • HPE DL380p Gen9 (Reported by Victor)

PSOD Error

PSOD: #PF Exception 14 in world 68297:sfcb-intelcim IP 0x41801b704d8f addr 0x443919649c000

In short the customer was running ESXi 5.5, and I reinstalled the servers using HPE 6.5 Update 1 OEM Image found here: https://my.vmware.com/web/vmware/details?downloadGroup=OEM-ESXI65U1-HPE&productId=614

I configured the servers and found that VMware Update Manager had 3 additional updates:

VUM Updates

Oddly it still wanted to install Update 1. I had already updated some Gen8 servers, and they had no issue with me installing all updates, so I went ahead and installed all three updates. But in this case the server never came back online. Instead it kept giving a PSOD on boot:

PSOD
#PF Exception 14 in world 68297:sfcb-intelcim IP 0x41801b704d8f addr 0x443919649c000

After some investigation I found it to be caused by the Ixgben driver updates. It is a driver for Intel network adapters. This also corresponded with the PSOD coming from sfcb-intelcim.

Anyway I contacted HPE Support, and the supported just heard of a similar case, so he wanted some logs and promised to return to me asap.

HPE Support return a little later and told me that they do not have a record of the noted kernel exception, so I was probably among the first to experience the error, if not the first. Hooray I was the first!

Workaround

HPE Returned and stated that those two updates should just be excluded, since it is a driver delivered by intel, and it does not work with the HPE firmware.

You should also avoid the VMware 6.5 Update 1 in Update Manager. I guess it also contains a ixgben driver from intel.

To quickly return to a working installation without reinstalling. you can press Shift+R during startup to get back to a working configuration. Thank you to my college André Briand de Crévecoeur for bringing this to my attention. https://kb.vmware.com/kb/1033604

23 thoughts on “ESXi 6.5 Update 1 PSOD on HPE 460c Gen9 after Ixgben driver update”

  1. I was able to workaround this issue by removing the Ixgben meta data and vib files inside the installer package.

    It still shows it needs updated in the vSphere Update Manager, but using vmware -lv shows the correct information now.

  2. +1

    DL380G8, installed with VMware-ESXi-6.5.0-Update1-5969303-HPE-650.U1.10.1.0.14-Jul2017.iso.

    Updated with VUM, which installed U1 again and two ixgben packages. -> PSOD

    Excluding the packages in VUM is not my favorite solution, but I guess there is still no other way.

  3. Hi we have the same issue DL380 G9. I would like to point out that the NICs we use are 561FLR-T Adapters. Oddly enough we have another cluster with 561T Adapters that does not show these symptoms..

  4. We had the same problem on 2 DL380 Gen9 with 10GB NIC’s present.
    Thank you for sharing this workaround.

    Does everyone, who has the problem, have 10GB NIC’s in their systems? Or is it also happening when there are no 10GB NIC’s present in the systems?

  5. Its happening with or without 10gb nic. Its related to the intel adapters. If you dont have intel (HP servers have integrated broadcom adapters) you dont run into this issue.

  6. Same problem and a thousand others, I’ll never and never buy another *deleted word* hp server, in my career I have never experienced such a lot of headaches with servers as with hp. Long life to DELL and other brands.

  7. I have the same issue. After using the HP 6.5 custom ISO & then updating the server PSOD. I Contacted VMware support and was told after a day of the tech investigating that it might be the net-ixgbe drivers. I tried that and still PSOD. I removed everything but the Update 1 and PSOD. So I removed the two drivers & the Update 1 and it worked. When VMware support gets Back to me I will let them know and see if I can get this fast tracked.
    Who remembers esXpress backup solution??? 😉

    1. From HP:
      PCIe NIC FW/Drivers Details:
      NIC Model: HP Ethernet 10Gb 2-port 560SFP+ Adapter (Slot 3)
      Driver: ixgbe Version: 4.5.2-iov
      Firmware Version: 0x80000835, 1.1618.0

      As per PSOD footprint, this issue could be related to intelcim-provider installed on server in question.

      Intel_bootbank_intelcim-provider_0.5-3.3:
      Name: intelcim-provider
      Version: 0.5-3.3
      Type: bootbank
      Vendor: Intel

      1. Remove intelcim-provider from server in question.

      2. Ensure to move ESXi host in maintenance or tech support mode before implementing this change.
      3. Connect to ESXi host using putty (SSH) application.
      4. Once connected to ESXi host using putty (SSH), run below command to stop CIM provider service.
      /etc/init.d/sfcbd-watchdog stop
      /etc/init.d/sfcbd status
      5. Further remove intelcim-provider from ESXi host in question.
      esxcli software vib remove -n=intelcim-provider
      6. Further run below command to start sfcbd again.
      /etc/init.d/sfcbd-watchdog start
      7. Once reboot host for changes to take effect.
      8. Re-connect to ESXi host using putty application & run below command to confirm sfcbd running.
      chkconfig sfcbd-watchdog
      chkconfig sfcbd
      /etc/init.d/sfcbd status
      /etc/init.d/sfcbd-watchdog status
      9. Run below command to confirm intelcim-provider no longer present.
      esxcli software vib list | grep intelcim-provider (if shows no output, this confirms intelcim-provider no longer present)
      Run Update manager & install Update 1
      10. Take ESXi host out of maintenance or tech support mode and monitor server for 24 – 48 hrs.

      This worked for me.

      1. I can confirm that Ed Donelley’s work around of removing the problematic VIB solved my issues. I loaded up two new DL360p Gen 9 servers and I encountered this very same issue. I rolled back using the SHIFT+R method, removed the VIB following Ed’s instructions and after rebooting, I applied all available updates to the server again, via Update Manager, and all was good.

        Thanks for all of your suggestions and tips…It saved me a call to VMware support and/or HP support.

        Happy Holidays to everyone!

  8. Removing the Intel-cim provider VIB worked for me, too. After removing the VIB and rebooting, I was able to install all the patches through VUM.

    I am curious, though, what effect will removing that VIB? What functionality will be lost (if any), by not having the intel-cim installed?

  9. Ridiculous that this is still an outstanding issue 1/11/2018. I re-installed an HP twice not knowing about the SHIFT+R trick. I wonder if Dell/EMC just doesn’t mind leaving this broken since HP is a competitor. It’s funny right Dell/EMC?

  10. I agree with MG, I’ve been waiting to update some servers until this was fixed. Guess I have no choice other than to work around the problem now.

    Still seeing the same problem on 1/29/2018

  11. I just upgraded some HPE BL460c Gen9 servers today, and I did not have this problem. Installed update 1 and ixgben driver without problems.

Leave a Reply to Peter Donka Cancel reply

Your email address will not be published. Required fields are marked *