vCNS to NSX Upgrades – Host Upgrades

Host Upgrades

  1. Place DRS in to manual mode (Do not disable DRS)
  2. Click Networking & Securityand then click Installation.
  3. Click the Host Preparation

All clusters in your infrastructure are displayed.

  1. For each cluster, click Update or Install in the Installation Status column.
  2. Each host in the cluster receives the new logical switch software.

The host upgrade initiates a host scan. The old VIBs are removed (though they are not completely deleted until after the reboot). New VIBs are installed on the altboot partition. To view the new VIBs on a host that has not yet rebooted, you can run the esxcli software vib list –rebooting-image | grep esx command.]

  1. Monitor the installation until the Installation Status column displays a green check mark
  2. After manually evacuating the hosts, select the cluster and click the Resolve The Resolveaction attempts to complete the upgrade and reboot all hosts in the cluster. If the host reboot fails for any reason, the Resolve action halts. Check the hosts in the Hosts and Clusters view, make sure the hosts are powered on, connected, and contain no running VMs. Then retry the Resolve action.
  3. You may have to repeat the above process for each host.
  4. You can confirm connectivity by performing the following checks
    1. Verify that VXLAN segments are functional. Make sure to set the packet size correctly and include the don’t fragment bit.
    2. Ping between two VMs that are on same virtual wire but on two different hosts (one host that has been upgraded and one host that has not)
      1. From a Windows VM: ping -l 1472 –f <dest VM>
      2. From a Linux VM: ping -s 1472 –M do <dest VM>
    3. Ping between two hosts’ VTEP interfaces.
      1. ping ++netstack=vxlan -d -s 1572 <dest VTEP IP>
    4. All virtual wires from your 5.5 infrastructure are renamed to NSX logical switches, and the VXLAN column for the cluster says Enabled

For more information in this series please continue on to the next part

vCNS to NSX Upgrades – vShield Manager Upgrade Steps

vShield Manager Upgrade Steps

1. Confirm the steps in 2.1.2 have been actioned.

2. Backup to FTP upgraded vCNS and shut-down the VM.

3. Check Snapshot has been taken of vShield Manager.

4. Check Support Bundle has been taken.

5. Shutdown vShield Manager and check the appliance has 4 vCPU’s and 16GB of memory.

6. Power on vShield Manager.

7. Upload vShield-NSX Upgrade, apply and reboot (upgrade file size is 2.4G!).

8. Check you can login to NSX Manager A once the upgrade has completed.

9. You may need to restart the VC Web Client in order to see the plugin in the vSphere Web Client.

10. Check SSO single sign-on in NSX Manager configuration. May need to re-register.

11. Configure Segment ID’s and Multicast Address (Recorded from vShield Manager).

12. Configure backup to FTP location – take backup of NSX Manager A.

13. Create Snapshot on NSX Manager A.

14. Shutdown NSX Manager A.

15. Deploy new NSX Manager B from OVF with same IP as A.

16. Restore FTP Backup from NSX Manager A.

17. Check vCenter Registration and NSX Manager Login.

18. Check NSX Manager for list of Edges, Logical Switches.

19. Once you are happy connectivity is functioning correctly continue with the upgrade.

For more information in this series please continue on to the next part

vCNS to NSX Upgrades – Pre Upgrade Steps

Pre-Upgrade Steps

One or more days before the upgrade, do the following:

  1. Verify that vCNS is at least version (See point 8 Below)
  2. Check you are running one of the following recommended builds vSphere 5.5U3 or vSphere 6.0U2
  3. Verify that all required ports are open (please see appendix A)
  4. Verify that your vSphere environment has sufficient resource for the NSX components.
  1. Verify that all your applicable vSphere clusters have sufficient resource to allow DRS to migrate running workloads during the host preparation stage (n+1).
  2. Verify that you can retrieve uplink port name information for vSphere Distributed Switches. See VMware KB 2129200 for further information. (Note. This is not applicable to NHS Manchester as we are expected to upgrade to NSX 6.2.4)
  3. Ensure that forward and reverse DNS, NTP as well as Lookup Service is working.
  4. If any vShield Endpoint partner services are deployed, verify compatibility before upgrading:
    • Consult the VMware Compatibility Guide for Networking and Security.
    • Consult the partner documentation for compatibility and upgrade details
  5. If you have Data Security in your environment, uninstall it before upgrading vShield Manager.
  6. Check all running edges are on the same latest version as the vShield Manager i.e.
  7. Verify that the vShield Manager vNIC adaptor is VMXNET3.this should be the case if running vShield Manager version however the e1000 vNIC may have been retained if you have previously upgraded the vShield Manager. In order to replace the vNIC follow the steps in KB 2114813. This in part involves deploying a fresh vShield Manager and restoring the configuration. See Appendix C or (http://kb.vmware.com/lb/2114813)
  8. Increase the vShield Manager memory to 16GB.

Pre-Upgrade Validation Steps

Immediately before you begin the upgrade, do the following to validate the existing installation.

  1. Identify administrative user IDs and passwords.
  2. Verify that forward and reverse name resolution is working for all components.
  3. Verify you can log in to all vSphere and vShield components.
  4. Note the current versions of vShield Manager, vCenter Server, ESXi and vShield
  5. Check Multicast address ranges are valid (The recommended multicast address range starts at 239.0.1.0/24 and excludes 239.128.0.0/24.)
  6. Verify that VXLAN segments are functional. Make sure to set the packet size correctly and include the don’t fragment bit.
    1. Ping between two VMs that are on same virtual wire but on two different hosts.
      1. From a Windows VM: ping -l 1472 –f <dest VM>
      2. From a Linux VM: ping -s 1472 –M do <dest VM>
    2. Ping between two hosts’ VTEP interfaces.
      1. ping ++netstack=vxlan -d -s 1572 <dest VTEP IP>
    3. Validate North-South connectivity by pinging out from a VM.
    4. Visually inspect the vShield environment to make sure all status indicators are green, normal, or deployed.
    5. Verify that syslog is configured.
    6. If possible, in the pre-upgrade environment, create some new components and test their functionality.
      1. Validate netcpad and vsfwd user-world agent (UWA) connections.
        1. On an ESXi host, run esxcli network vswitch dvs vmware vxlan network list –vds-name= and check the controller connection state.
        2. On vShield Manager, run the show tech-support save session command, and search for “5671” to ensure that all hosts are connected to vShield Manager.
      2. Check Firewall functionality via Telnet or netcat to confirm the edge firewalls are working as expected.

Pre-Upgrade Steps

Immediately before you begin the upgrade, do the following:

  1. Verify that you have a current backup of the vShield Manager, vCenter and other vCloud Networking and Security components. See Appendix B for the necessary steps to accomplish this.
  2. Purge old logs from the vShield Manager “Purge log Manager” and “purge log system”
  3. Take a snapshot of the vShield Manager, including its virtual memory.
  4. Take a backup of the vDS
  5. Create a Tech Support Bundle.
  6. Record Segment ID’s and Multicast address ranges in use
  7. Increase the memory on the vShield Manager to 16GB and 4 vCPU.
  8. Ensure that forward and reverse domain name resolution is working, using the nslookup command.
  9. If VUM is in use in the environment, ensure that the bypassVumEnabled flag is set to true in vCenter. This setting configures the EAM to install the VIBs directly to the ESXi hosts even when the VUM is installed and/or not available.
  10. Download and stage the upgrade bundle, validate with md5sum.
  11. Do not power down or delete any vCloud Networking and Security components or appliances before instructed to do so.
  12. VMware recommends to do the upgrade work in a maintenance window as defined by your company.

For more information in this series please continue on to the next part

NSX Edge becomes unmanageable after upgrading to NSX 6.2.3

If you are in the process of upgrading to NSX 6.2.3 and hit the above issue have a look at the following KB.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2145887

This issue occurs when serverSsl or clientSsl is configured in load balancer, but ciphers value is set as NULL in the previous version.

VMware NSX for vSphere 6.2.3 introduced approved ciphers list in NSX Manager and does not allow the ciphers to be NULL for serverSsland clientSsl in load balancer.

Note: Default ciphers value is NULL in NSX 6.2.x.

Since the ciphers value defaults to NULL in the earlier version, if this is not set, NSX Manager 6.2.3 considers this ciphers value as invalid hence the reason why this issue occurs.

There is a workaround in the above KB which involves making a POST API call to set ciphers value from NULL to Default.

NSX Design & Deploy – Course Notes Part 2

Following on from the last post here are my remaining key points:

  • L2 Bridging – Single vDs only supported
  • DLR’s don’t place two DLR uplinks in the same Transport Zone
  • OSPF – Consider setting the OSPF timers to an aggressive setting if you have a requirement to do so  lowest setting is – Hello 1 Sec / Dead 3 Secs ( on Physical Router as well – set both sides i.e. the physical as well but note it might limit what you can set as an aggressive timer ) (10 Hello /40 Dead is the default protocol timeout)
  • Recommendation is to use anti-affinity rules to prevent deploying the DLR Control VM on the same ESXi host with an Active ESG – to prevent losing the DLR control VM and the Edge with the summary routes on (if you have set them – you should have! 😉 )
  • When applying DFW rules on the cluster object if you move VM’s between clusters then there is a few second outage on protection while the new rules are applied
  • Security policy weights – higher number takes priority e.g. 2000 higher than 1000

NSX Design & Deploy – Course Notes

So I have just finished a week in Munich on an internal NSX design & deploy course which had some great content. I have made a few notes and wanted to very briefly blog those, mainly for my benefit but also in the hope they may be useful or interesting to others as well.

In no particular order:

  • vSphere 5.5 – When upgrading the vDS from 5.1 to 5.5 you need to upgrade vDS to Advanced mode after you do the vDS upgrade. This enables features such as LACP and NIOC v3 but is a requirement for NSX to function correctly.
  • When considering Multicast mode with VTEPs in different subnets be aware that PIM is not included in the Cisco standard licence and requires advanced! PIM is a requirement when using multiple VTEPs in different subnets.
  • When using dynamic routing with a HA enabled DLR the total failover time could be up to 120 seconds
    • 15 sec dead time ( 6 sec min 9-12 recommended- 6 not recommended as its pinging every 3 seconds)
    • claim interfaces/ start services
    • ESXi update
    • route reconvergence
  • Consider setting a static route (summary route) with the correct weight which is lower than the dynamic route  so traffic isn’t disrupted during DLR failover
  • Different subnets for virtual and physical workloads lets you have 1 summary route for the virtual world. (for the above event – this is likely not possible in a brownfield deployment)
  • Note during a DLR failover it looses the routing adjacency – the routing table entry upstream is removed during this period – which looses the route – this could be avoided by increasing the OSPF timers upstream so they won’t lose the route.
  • If you fail to spread the transport zones across all the applicable hosts that are part of the same vDS -the port groups will exist but won’t be able to communicate (No traffic flow – if someone accidentally selects that port group)

Part two coming soon!

Useful NSX Troubleshooting Commands

download

Having just completed a NSX 6.2 ICM course I wanted to share with you some commonly used troubleshooting commands which Simon Reynolds shared with us.

ESXi Host CLI Commands

Here are some useful CLI examples to run in ESXi shell for logical switch info:

# esxcli network vswitch dvs vmware vxlan network ….

# esxcli network vswitch dvs vmware vxlan network arp list

–vds-name=Compute_VDS –vxlan-id=5001

# /bin/net-vdl2 -M arp -s Compute_VDS -n 5001

# esxcli network vswitch dvs vmware vxlan network mac list

–vds-name=Compute_VDS –vxlan-id=5001

# /bin/net-vdl2 -M mac -s Compute_VDS -n 5001

# esxcli network vswitch dvs vmware vxlan network vtep list

–vds-name=Compute_VDS –vxlan-id=5001

# /bin/net-vdl2 -M vtep -s Compute_VDS -n 5001

vDS Info:

net-vds -l

b) Show the separate ip stack for vxlan

 esxcli network ip netstack list

c) Raise the netcpa logging level to verbose (logs ESXi to controller messages in more detail)

# /etc/init.d/netcpad stop

# chmod +wt /etc/vmware/netcpa/netcpa.xml

# vi /etc/vmware/netcpa/netcpa.xml

and change info to verbose between the <level> tags, then save the file and then restart the netcpa daemon:

# /etc/init.d/netcpad start

d) Packet capture commands

# pktcap-uw –-uplink vmnic2 –o unencap.pcap –dir=1 –-stage=0

# tcpdump-uw –enr unencap.pcap

or

# pktcap-uw –-uplink vmnic2 –-dir=1 –-stage=0 -o – | tcpdump-uw –enr –

(–dir=1 implies “outbound”, –dir=0 implies “inbound”, –stage=0

implies “before the capture point”, –stage=1 implies “after the capture

point”)

Show vxlan encapsulated frames:

# pktcap-uw –uplink vmnic2 –-dir=1 –-stage=1 -o -| tcpdump-uw –enr –

Show frames at a vm switchport connection:

# pktcap-uw –o – –switchport –dir 1 | tcpdump-uw –enr –

(get vm port id from esxtop network view)

 

Test VXLAN connectivity between hosts:

ping ++netstack=vxlan -d -s 1572 -I vmk3 xxx.xxx.xxx.xxx

 

NSX Controller Commands:

# show control-cluster status

# show control-cluster logical-switches vni 5000

# show control-cluster logical-switches connection-table 5000

Need to be on the controller that manages the vni for the next commands:

# show control-cluster logical-switches vtep-table 5001

# show control-cluster logical-switches mac-table 5001

# show control-cluster logical-switches arp-table 5001