Useful NSX Troubleshooting Commands


Having just completed a NSX 6.2 ICM course I wanted to share with you some commonly used troubleshooting commands which Simon Reynolds shared with us.

ESXi Host CLI Commands

Here are some useful CLI examples to run in ESXi shell for logical switch info:

# esxcli network vswitch dvs vmware vxlan network ….

# esxcli network vswitch dvs vmware vxlan network arp list

–vds-name=Compute_VDS –vxlan-id=5001

# /bin/net-vdl2 -M arp -s Compute_VDS -n 5001

# esxcli network vswitch dvs vmware vxlan network mac list

–vds-name=Compute_VDS –vxlan-id=5001

# /bin/net-vdl2 -M mac -s Compute_VDS -n 5001

# esxcli network vswitch dvs vmware vxlan network vtep list

–vds-name=Compute_VDS –vxlan-id=5001

# /bin/net-vdl2 -M vtep -s Compute_VDS -n 5001

vDS Info:

net-vds -l

b) Show the separate ip stack for vxlan

 esxcli network ip netstack list

c) Raise the netcpa logging level to verbose (logs ESXi to controller messages in more detail)

# /etc/init.d/netcpad stop

# chmod +wt /etc/vmware/netcpa/netcpa.xml

# vi /etc/vmware/netcpa/netcpa.xml

and change info to verbose between the <level> tags, then save the file and then restart the netcpa daemon:

# /etc/init.d/netcpad start

d) Packet capture commands

# pktcap-uw –-uplink vmnic2 –o unencap.pcap –dir=1 –-stage=0

# tcpdump-uw –enr unencap.pcap


# pktcap-uw –-uplink vmnic2 –-dir=1 –-stage=0 -o – | tcpdump-uw –enr –

(–dir=1 implies “outbound”, –dir=0 implies “inbound”, –stage=0

implies “before the capture point”, –stage=1 implies “after the capture


Show vxlan encapsulated frames:

# pktcap-uw –uplink vmnic2 –-dir=1 –-stage=1 -o -| tcpdump-uw –enr –

Show frames at a vm switchport connection:

# pktcap-uw –o – –switchport –dir 1 | tcpdump-uw –enr –

(get vm port id from esxtop network view)


Test VXLAN connectivity between hosts:

ping ++netstack=vxlan -d -s 1572 -I vmk3


NSX Controller Commands:

# show control-cluster status

# show control-cluster logical-switches vni 5000

# show control-cluster logical-switches connection-table 5000

Need to be on the controller that manages the vni for the next commands:

# show control-cluster logical-switches vtep-table 5001

# show control-cluster logical-switches mac-table 5001

# show control-cluster logical-switches arp-table 5001



To TPS, or not to TPS, that is the question?


Something that crops up again and again when discussing vSphere designs with customers is whether on not they should enable (Inter-VM) TPS (Transparent Page Sharing) as since the end of 2014 VMware decided to disable TPS by default.

To understand if you should enable TPS you need to firstly understand what it does.  TPS is quite a misunderstood beast as many people contribute a 20-30% memory overcommitment too TPS where in reality it’s not even close to that. That is because TPS only comes in to effect when the ESXi host is close to memory exhaustion. TPS would be the first trick in the box that ESXi uses to try to reduce the memory pressure on the host, if it can not do this then the host would then start ballooning. This would be closely followed by compression and then finally swapping, none of which are cool guys!

So should I enable (Inter-VM) TPS?… well as that great IT saying goes… it depends!

The reason VMware disabled (Inter-VM) TPS in the first place was because of their stronger stance on security (Their Secure by Default Stance), their concern was a man in the middle type attack could be launched and shared pages compromised. So in a nutshell, you need to consider the risk of enabling (Inter-VM) TPS. If you are running and offering a public cloud solution from your vSphere environment then it may be best to leave TPS disabled as you could argue you are under a greater risk of attack; and have a greater responsibility for your customers data.

If however you are running a private vSphere compute environment then the chances are only IT admins have access to the VM’s so the risk is much less. Therefore to reduce the risk of running in to any performance issues caused by ballooning and swapping you may want to consider enabling TPS, which would help mitigate against both of these.




Migrating from a vDS to a VSS

So you need to remove a NSX enabled host from a cluster but first you need to migrate your host networking from the vDS to a VSS. There are a few ways to do this but by far the easiest is to migrate your vmkernel portgroups rather than recreate them manually Doing it this way preserves your IP config and saves you having to setup each of the vmkernel interfaces (Management, vMotion, iSCSI).

Note. This method assumes you have more than 1 NIC in your host/ vDS.

Firstly we need to identify and remove the NICs we will use to stage our portgroups during the migration (in our case we will use two nics) from the vDS and to make them available to be used in the VSS. Make sure that if you are tagging different vlans over different nics (i.e nics just for iSCSI) you know which NICs are tagged with what VLAN.

Screenshot 2015-12-23 10.42.51

We have removed NIC 0 and 2 from out vDS this can be done by finding the host you want to remove in the vSphere Client and clicking “Manage Physical Adapters”, on the next screen click remove next to the adapters you wish to remove. (It’s good practise to make sure your teaming and failover orders are set not to use the NICs you will remove)

Screenshot 2016-01-04 14.40.56

Once the NICs have been removed we need to create two Virtual Switches (in our case) ready to receive the migrated vmkernel portgroups.

Screenshot 2016-01-04 14.46.22

We can then click “Manage Virtual Adapters” next to where we clicked “Manage Physical Adapters” above.

Screenshot 2015-12-23 10.42.58

We then select the vmkernel interface we wish to migrate and click migrate.

Screenshot 2015-12-23 10.43.05

Select the switch the vmkernel portgroup will be migrated too and click next.

Screenshot 2015-12-23 10.43.10

Give the new portgroup a name i.e. Management/ vMotion etc (your existing vDs portgroup name can be used here).

Screenshot 2015-12-23 10.43.16

If you are using VLAN trunking then specify the applicable VLAN as well.

Screenshot 2015-12-23 10.43.29

Now click finish and the wizard will create the new VSS portgroup for you.

Screenshot 2015-12-23 10.43.35

Repeat this process for each portgroup you need to migrate, as you can see below I have migrated 3 portgroups.

Screenshot 2015-12-23 10.47.17


If you have any VM’s running on this host you can now create your VM portgroups and migrate the VM’s on to these new portgroups at this point.

If we edit one of the portgroups we can see that its using the NIC we migrated over previously.

Screenshot 2015-12-23 10.47.36

Make sure to set the MTU if this is needed as migrating the portgroups does not always migrate the previous MTU value.

Screenshot 2015-12-23 10.47.48

Once you have migrated all your vmkernel portgroups over you can now go back and remove the remaining NICs from the vDS and assign them to the necessary standard vSwitch (VSS).

Make sure to set the correct teaming and failover order, and to go back and change the teaming and failover order on your vDS if you changed it at the start of the process.

Screenshot 2015-12-23 10.50.39

You can now safely remove the vDS from the host.

Screenshot 2015-12-23 10.49.18