Diagnose Network Problems In A Container Using The Filesystem

So you need to figure out why you application is not working, but your toolset is very limited because you are using either a minimal installation of Linux or you are in a container. This technique should work either way.

You can use the /dev filesystem to check network connections.

Example 1:
You would like to know if you can reach the host 172.16.0.10 on port 443.

ip=172.16.0.10
port=443
if $((echo > /dev/tcp/$ip/$port) &>/dev/null)
then echo "TCP port $port is open"
else echo "TCP port $port is closed"
fi

The above command will you the answer.

You can also write this as a one-liner

ip=172.16.0.10;port=443; (echo > /dev/tcp/$ip/$port) &>/dev/null && echo "TCP port $port is open" || echo "TCP port $port is closed"

Example 2:

You would like to know what connections are being made to and from the container or host, but you do not have netstat.

Continue reading Diagnose Network Problems In A Container Using The Filesystem

Cannot export ISO from vLCM cluster image

When you try to export an ISO file in VMware vCenter from a cluster using single cluster image with vLCM. You will get the following error:

A general system error occurred: Error occurred while exporting ESXi image and/or image document.

The error is accompanied with an error in the vmware-vum-server-#.log file in /var/log/vmware/vmware-updatemgr/vum-server catalog like the following:

2023-06-14T12:21:23.882Z error vmware-vum-server[09453] [Originator@6876 sub=VumVapi::Lib::Utils] [ExportTask 92] Failed to export cluster image from depot. errorCode: 99

In my case I was able to export it as a zip bundle and the corresponding json configuration file exported successfully as well.

The problem lies with vendor signatures, and vmware does not currently have a solution for this unfortunately except that it normally helps to remove the vendor packages attached to the cluster.

https://kb.vmware.com/s/article/91237

More information is available here: https://communities.vmware.com/t5/vCenter-Server-Discussions/Cannot-export-vLCM-image-if-you-use-a-custom-SSL-cert-Non/td-p/2881200/page/2

List VMs with Secure Boot enabled on Windows Server 2022

Since Microsoft released: KB5022842 a lot of customers has experienced Windows Server 2022 not being able to boot. On vSphere 7 this might be a problem if you have installed the patch at enabled secure boot for the server.

More information is available here: VMware KB90947

If you need to find VM that are running Windows Server 2022 and have enabled Secure Boot it is not that easy.

The problem is that your cannot always be sure that the OS selected for the VM is the OS actually installed in the VM. If for instance you installed Windows Server 2022 before is was officially supported in vSphere you might have chosen Windows Server 2019. So you will need to use the OS name that VMware tools are reporting.

But what is VMware tools is not running. That’s a problem.

The following script will find VMs with Secure Boot enabled that are running Windows Server 2022, but also VM’s where we are not certain because VMware Tools is not running.

Continue reading List VMs with Secure Boot enabled on Windows Server 2022

NSX-T Troubleshooting IDFW rules

So you have migrated to NSX-T 3.2 and you are using IDFW rules to allow users to dynamically gain access when they log in to any physical device in the domain.

Only trouble is that now it is not really working, and VMware did not yet implement a way in the gui to see the effective members of Groups that contains Active Directory members.

Well there is a way you can see who is in the group at least, but there are a couple of steps.

How to find the effective group members

Step one is to identify the rule you are troubleshooting. Make a note of the rule id.

Next find the host the destination VM is running on. You can do this manually in vCenter or use powershell. That’s up to you.

Continue reading NSX-T Troubleshooting IDFW rules

Automating VMware Workstation LAB

I am often working with quite large test environments. Powering on ESXi hosts with nested VMs can be a pain when you need to get it running quickly.

Here are some of my tricks to automating VMware Workstation

Continue reading Automating VMware Workstation LAB

Nested or Native

Should you buy dedicated hardware or a OP workstation for you next testing environment. If you are not sharing it with others, this might be useful for you.

History

For many years now VMware Workstation has been my secret weapon an daily tool for just about everything in regards to customer remote connections, test environments as so on.

Recently I needed to do some advanced testing with NSX-V and NSX-T. This required a lot more power than what I normally use so I needed to upgrade my testing platform.

The consideration is always with these things. How much are you going to invest, and what are the benefits. For a long time I have been considering buying 4 Intel NUC PC’s for doing these tests, but the problem is that to get a real setup that is flexible you need to invest a lot. Also it is not very flexible as you need to maintain them, and reinstall them everytime you need to play with a newer or older version.

Continue reading Nested or Native

vCenter services not starting after 6.7 Update 3f upgrade

After upgrading a 3 vCenter Enhanced Link Mode environment a customer experienced the following error:

Server is at a higher functional level (1) than partner (<partner vCenter server>)(0) and cannot perform at a lower level.

When checking the domain functional level of each of the three servers, they all state that they are at level 1. The other server are starting like normal.

One server is not starting the vmdir service and it is also that service that is reporting the error. Most of the other services on a vCenter is dependant on this the vmdir.

I do not have a solution as of now. I might have a reason for the issue, and I might have a workaround. VMware is currently trying to figure out how to fix this.

Continue reading vCenter services not starting after 6.7 Update 3f upgrade

ESXi 6.7 PSOD with qfle3 driver version above 1.0.69.1

Had a ESXi PSOD today. That does not happened that often, so I was quite surprised to find out that it was not a hardware related issue that was the root cause.

VMware did an analysis of the memory dump, and it turned out to be a faulty driver. That made sense since the PSOD often comes from drivers og agents when it is not a hardware issue.

The PSOD i got was the following:

#PF Exception 14 in World xxxxxxx:vmnicX-pollw IP xxxxxxxxxx addr xxxxxxxx
Continue reading ESXi 6.7 PSOD with qfle3 driver version above 1.0.69.1

Error: cannot install the vcenter agent service. cannot upload agent after vCSA upgrade

I was just updating a vCenter server and some ESXi hosts, but after running the vCenter update I found vCenter full of HA Agent install fails. To stop this fail loop, I turned off VMware HA while figuring out what was wrong.

Error: cannot install the vcenter agent service. cannot upload agent after vCSA upgrade
Continue reading Error: cannot install the vcenter agent service. cannot upload agent after vCSA upgrade

Update Manager ELX_bootbank_elx-esx-libelxima.so driver conflict

Just provisioned the HPE ESXi 6.7 Update 3 custom OEM image onto some HP DL560 Gen10 servers.

After I updated the servers using update manager and the HPE vibsdepot I ran into problems. Turns out there is a conflict between the VMware provided driver and the HPE provided driver.

The result is that I cannot install all updates to satisfy compliance.

Checking the esxupdate.log file on the ESXi hosts I get the following error:

ValueError: VIBs ELX_bootbank_elx-esx-libelxima.so_12.0.1108.0-03 and ELX_bootbank_elx-esx-libelxima.so_12.0.1108.0-03 have unequal values of the 'payloads' attribute: '[elx-esx-libelxi: 1602.936 KB]' != '[elx-esx-libelxi: 1493.833 KB]'
Continue reading Update Manager ELX_bootbank_elx-esx-libelxima.so driver conflict