A Multi-Site vSphere/SDDC Upgrade – A Step by Step Guide

I wanted to share with you the steps required to upgrade a typical multi-site vSphere/ SDDC implementation. This particular implementation had the following products.

 

Product Current version Planned Version
vCenter Server 6.0 U3 6.5 U1
ESXi 6.0 U3 6.5 U1
vSAN 6.2 6.6.1
NSX 6.2.4 6.3.5
vROPS 6.3.0 6.6.1
vRLI 3.3.1 4.5
vDP 6.1.0.173 6.1.5
vSphere Replication 6.1.1 6.5.1
VMware Tools 6.0 U3 6.5 U1

 

In order to understand the upgrade order here is some background information on this fictitious environment.

The environment has a Production site and a DR site. NSX is deployed across both sites using universal objects, SRM and vSphere Replication are in use with vSAN being used locally at each site. It also has a deployment of vROPS and vRLI at both sites.

Note. The following should be taken as a rough guide only please make sure you check the appropriate VMware Guides/KB’s for your product versions before commencing any upgrade.

 

Action

Site

Impact to vSphere

Required

VM downtime

1.      

Carry out Health Check of VC & PSC before starting (Go or No Go)

Both Sites

N/A

Yes

No

2.      

Backup All Components

Both Sites

N/A

Yes

No

3.      

Backup PSC’s with vDP & Data Protect

Both Sites

N/A

Yes

No

4.      

Deploy second PSC at each site – Configure replication see KB 2131191 for justification

Both Sites

vCenter management of ESXi hosts unavailable during upgrade.

Yes

No

5.      

Disable vSphere Replication

Both Sites

No protection of VMs (Backup is the only method of restoring)

Yes

No

6.      

Upgrade the external Platform Services Controller server 6.0.x to vCenter 6.5 for both sites

Both Sites

vCenter management of ESXi hosts unavailable during upgrade.

Yes (if using an external Platform Services Controller)

No

7.      

Upgrade vDP at both sites

Both Sites

Backups unavailable during upgrade

Yes in order to restore in to 6.5 during the upgrade

No

8.      

Upgrade NSX Manager at the DR Site

DR Site

No Changes to DR NSX during Upgrade

Yes

No

9.      

Upgrade NSX Controller Cluster at the DR Site

DR Site

NSX reverts to read only mode Change Window Required

Yes

Yes – No Disruption as long as VM’s don’t move or any changes made

10.    

Upgrade NSX Host preperation at the DR Site

DR Site

Hosts require Reboot

Yes

No

11.    

Upgrade NSX DLR’s at the DR Site

DR Site

Disruption to Service

Yes

Yes

12.    

NSX Edges at the DR Site

DR Site

Outage required while edge is redeployed and upgraded

Yes

Yes

13.    

Upgrade vCenter from vCenter 6.0.x to vCenter 6.5. at the DR Site

DR Site

vCenter management of ESXi hosts unavailable during upgrade.

Yes

No

14.    

Upgrade vROPS

Both Sites

N/A

Yes

No

15.    

Upgrade Log Insight

Both Sites

N/A

Yes

No

16.    

Use vSphere Update Manger to scan and remediate an ESXi host.

DR Site

1 host not available, reduced capacity.

Yes

No

17.    

Repeat steps for remaining hosts in the cluster.

DR Site

One host will always be unavailable while it is being upgraded with vSphere Update Manager.

Yes

No

18.    

Upgrade vSAN at the DR Site

DR Site

Possible Performance and increased risk of failure during upgrade – upgrade could take several days.

Yes

No

19.    

Upgrade NSX Manager at the Prod Site

Prod Site

No Changes to DR NSX during Upgrade

Yes

No

20.    

Upgrade NSX Controller Cluster at the Prod Site

Prod Site

NSX reverts to read only mode Change Window Required

Yes

Yes – No Disruption as long as VM’s don’t move or any changes made

21.    

Upgrade NSX Host preperation at the Prod Site

Prod Site

Hosts require Reboot

Yes

No

22.    

Upgrade NSX DLR’s at the Prod Site

Prod Site

Disruption to Service

Yes

Yes

23.    

NSX Edges at the Prod Site

Prod Site

Outage required while edge is redeployed and upgraded

Yes

Yes

24.    

Upgrade vCenter from vCenter 6.0.x to vCenter 6.5. at the Prod Site

Prod Site

vCenter management of ESXi hosts unavailable during upgrade.

Yes

No

25.    

Upgrade vSphere Replication at Both Sites

Both Sites

Replication unavailable until this points it has to be disabled before starting the upgrade process

Yes

No vSphere Replication protection during upgrade

26.    

Use vSphere Update Manger to scan and remediate an ESXi host.

Prod Site

1 host not available, reduced capacity.

Yes

No

27.    

Repeat steps for remaining hosts in the cluster.

Prod Site

One host will always be unavailable while it is being upgraded with vSphere Update Manager.

Yes

No

28.    

Upgrade vSAN at the Prod Site

Prod Site

Possible Performance and increased risk of failure during upgrade – upgrade could take several days.

Yes

No

29.    

Optional: Update VMware Tools on each VM with a vSphere Update Manager baseline.

Prod Site

Virtual machine reboot.

Recommended

Reboot

30.    

Optional: Update virtual hardware on each VM with a vSphere Update Manager baseline.

Prod Site

Virtual machine shutdown, 1+ reboots.

No

Yes; upgrade during shutdown

 

Further Upgrade Information

vSphere

 

  1. Upgrade all external Platform Services Controller servers running 6.0.x to 6.5.
  2. Upgrade vCenter Server 6.0.x to vCenter 6.5.

This step includes the upgrade of all components and installation of new components that were not previously addressed.

During the upgrade, the vCenter Server will be unavailable to perform any provisioning operations or functions such as vSphere vMotion and vSphere DRS. Once started, the vCenter database is upgraded first, followed by the vCenter binary upgrade. After the upgrade, the hosts are auto-reconnected.

  1. Upgrade vSphere Update Manager(Windows Only, on vSphere 6.5 Update Manager is included with the vCenter Appliance). In the same way as vCenter, the database is upgraded first, followed by binaries. With vSphere 6.5, vSphere Update Manager is configured and administered through the vSphere Web Client.
  2. Use vSphere Update Manager to create a baseline, scan the hosts, and then upgrade the ESXi hosts to version 6.5. A copy of the ESXi 6.5 installation media must be available and added to vSphere Update Manager.
  3. Use vSphere Update Manager to create a baseline for VMware Tools, scan virtual machines, and remediate to upgrade VMware Tools.
  4. Upgrade VMware Virtual Machine Hardware to version 13 (if not already completed previously) to take advantage of new features in vSphere 6.5.
  • This step should not be performed until all hosts have been upgraded to ESXi 6.5, because VMware Virtual Machine Hardware version 13 cannot be run on older hosts.
  1. If applicable, upgrade VMFS volumes on any ESXi host that has already been upgraded, but is not yet running VMFS6.
  • Only VMFS3 and above can be upgraded to VMFS5 on an ESXi 6.x host. Verify that older volumes are upgraded to at least VMFS3 before upgrading to ESXi 6.x.
  • Once a VMFS file system update has been completed, it is not possible to roll back this change. Therefore, hosts not running ESXi 5.x or 6.x will be unable to read these volumes.

 

 

vSAN

 

Before starting the vSAN upgrade process, ensure that the following requirements are met:

  1. The vSphere environment is up to date:
    • The vCenter Server managing the hosts must be at an equal or higher patch level than the ESXi hosts it manages.
    • All hosts should be running the same build of ESXi before vSAN cluster upgrade is started.
      • If the ESXi host versions are not matched, the hosts should be patched to the same build before upgrading.
  1. All vSAN disks should be healthy
    • No disk should be failed or absent
    • This can be determined via the vSAN Disk Management view in the vSphere Web Client
  2. There should be no inaccessible vSAN objects
    • This can be verified with the vSAN Health Service in vSAN 6.0 and above, or with the Ruby vSphere Console(RVC) in all releases.
  3. There should not be any active resync at the start of the upgrade process.
    • Some resync activity is expected during the upgrade process, as data needs to be synchronized following host reboots.
  4. Ensure that there are no known compatibility issues between your current vSAN version and the desired target vSAN version. For information on upgrade requirements, see vSAN upgrade requirements (2145248).
    • If required, update the vSAN cluster to the required build before undertaking the upgrade process to avoid compatibility concerns.

Host Preparation

Ensure you choose the right maintenance mode option. When you move a host into maintenance mode in vSAN, you have three options to choose:

  • Ensure availability
    If you select Ensure availability, vSAN allows you to move the host into maintenance mode faster than Full data migration and allows access to the virtual machines in the environment.
  • Full data migration
  • No data migration
    If you select No data migration, vSAN does not evacuate any data from this host. If you power off or remove the host from the cluster, some virtual machines might become inaccessible.

Exit maintenance mode and resync

  • When the ESXi host is upgraded and moved out of maintenance mode, a resync will occur. You can see this through the web client.
  • Ensure this is complete before moving onto the next host. A resync is occurring as the host that has been updated can now contribute to the vSAN Datastore again. It is vital to wait till this resync is complete to ensure there is no data loss.

 

 

NSX

 

For detailed information please refer to the NSX Upgrade guide.https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/nsx_63_upgrade.pdf

 

vROPS

 

For detailed information please refer to the vROPS Upgrade guide.

https://docs.vmware.com/en/vRealize-Operations-Manager/6.6/vrealize-operations-manager-66-vapp-deploy-guide.pdf

 

vRLI

 

For detailed information please refer to the vRLI Upgrade guide.

https://docs.vmware.com/en/vRealize-Log-Insight/4.5/com.vmware.log-insight.administration.doc/GUID-4F6ACCE8-73F4-4380-80DE-19885F96FECB.html

 

vDP

 

For detailed information please refer to the vDP Upgrade guide.

https://docs.vmware.com/en/VMware-vSphere/6.5/vmware-data-protection-administration-guide-61.pdf

 

 

vSphere 6.5 PSC’s – Multisite SSO Domains & Failures

Since vSphere 6.5 came along we have deprecated a PSC/SSO topology that I have seen customers deploy quite frequently, therefore I wanted to quickly explain what has changed and why.

Since vSphere 6.5 it is no longer possible to re-point a vCenter server between SSO sites – See VMware KB 2131191

This has a big knock on effect for some customers as traditionally they only deployed a single PSC (Externally) and a single vCenter at each site. In the event of a failure of the PSC at site 1 the vCenter server at site 1 could be re-pointed to the PSC in Site 2. This would then get you back up and running until you recovered the failed PSC.

However in vSphere 6.5 this is no longer possible – If you have two SSO sites lets call them Site 1 and Site 2 you cannot re-point a vCenter server between the two sites. So if you only had a single PSC (think back to the above example and vSphere 6.0) you would not be able to recover that site successfully.

So how do I get round this?

Deploy two PSC’s at site and within the same SSO site/domain (See the below diagram) you don’t even need to use a load balancer, as you can manually re-point your vCenter to the other PSC running at the same site in the event of a failure.

Therefore when deploying multiple SSO sites and PSC’s across physical sites consider the above and make sure you are deploying a supported and recoverable topology.

For a list of supported topologies in vSphere 6.5 and for further reading please see VMware KB 2147672

 

vCNS to NSX Upgrades – vShield Manager Upgrade Steps

vShield Manager Upgrade Steps

1. Confirm the steps in 2.1.2 have been actioned.

2. Backup to FTP upgraded vCNS and shut-down the VM.

3. Check Snapshot has been taken of vShield Manager.

4. Check Support Bundle has been taken.

5. Shutdown vShield Manager and check the appliance has 4 vCPU’s and 16GB of memory.

6. Power on vShield Manager.

7. Upload vShield-NSX Upgrade, apply and reboot (upgrade file size is 2.4G!).

8. Check you can login to NSX Manager A once the upgrade has completed.

9. You may need to restart the VC Web Client in order to see the plugin in the vSphere Web Client.

10. Check SSO single sign-on in NSX Manager configuration. May need to re-register.

11. Configure Segment ID’s and Multicast Address (Recorded from vShield Manager).

12. Configure backup to FTP location – take backup of NSX Manager A.

13. Create Snapshot on NSX Manager A.

14. Shutdown NSX Manager A.

15. Deploy new NSX Manager B from OVF with same IP as A.

16. Restore FTP Backup from NSX Manager A.

17. Check vCenter Registration and NSX Manager Login.

18. Check NSX Manager for list of Edges, Logical Switches.

19. Once you are happy connectivity is functioning correctly continue with the upgrade.

For more information in this series please continue on to the next part

vCNS to NSX Upgrades – Pre Upgrade Steps

Pre-Upgrade Steps

One or more days before the upgrade, do the following:

  1. Verify that vCNS is at least version (See point 8 Below)
  2. Check you are running one of the following recommended builds vSphere 5.5U3 or vSphere 6.0U2
  3. Verify that all required ports are open (please see appendix A)
  4. Verify that your vSphere environment has sufficient resource for the NSX components.
  1. Verify that all your applicable vSphere clusters have sufficient resource to allow DRS to migrate running workloads during the host preparation stage (n+1).
  2. Verify that you can retrieve uplink port name information for vSphere Distributed Switches. See VMware KB 2129200 for further information. (Note. This is not applicable to NHS Manchester as we are expected to upgrade to NSX 6.2.4)
  3. Ensure that forward and reverse DNS, NTP as well as Lookup Service is working.
  4. If any vShield Endpoint partner services are deployed, verify compatibility before upgrading:
    • Consult the VMware Compatibility Guide for Networking and Security.
    • Consult the partner documentation for compatibility and upgrade details
  5. If you have Data Security in your environment, uninstall it before upgrading vShield Manager.
  6. Check all running edges are on the same latest version as the vShield Manager i.e.
  7. Verify that the vShield Manager vNIC adaptor is VMXNET3.this should be the case if running vShield Manager version however the e1000 vNIC may have been retained if you have previously upgraded the vShield Manager. In order to replace the vNIC follow the steps in KB 2114813. This in part involves deploying a fresh vShield Manager and restoring the configuration. See Appendix C or (http://kb.vmware.com/lb/2114813)
  8. Increase the vShield Manager memory to 16GB.

Pre-Upgrade Validation Steps

Immediately before you begin the upgrade, do the following to validate the existing installation.

  1. Identify administrative user IDs and passwords.
  2. Verify that forward and reverse name resolution is working for all components.
  3. Verify you can log in to all vSphere and vShield components.
  4. Note the current versions of vShield Manager, vCenter Server, ESXi and vShield
  5. Check Multicast address ranges are valid (The recommended multicast address range starts at 239.0.1.0/24 and excludes 239.128.0.0/24.)
  6. Verify that VXLAN segments are functional. Make sure to set the packet size correctly and include the don’t fragment bit.
    1. Ping between two VMs that are on same virtual wire but on two different hosts.
      1. From a Windows VM: ping -l 1472 –f <dest VM>
      2. From a Linux VM: ping -s 1472 –M do <dest VM>
    2. Ping between two hosts’ VTEP interfaces.
      1. ping ++netstack=vxlan -d -s 1572 <dest VTEP IP>
    3. Validate North-South connectivity by pinging out from a VM.
    4. Visually inspect the vShield environment to make sure all status indicators are green, normal, or deployed.
    5. Verify that syslog is configured.
    6. If possible, in the pre-upgrade environment, create some new components and test their functionality.
      1. Validate netcpad and vsfwd user-world agent (UWA) connections.
        1. On an ESXi host, run esxcli network vswitch dvs vmware vxlan network list –vds-name= and check the controller connection state.
        2. On vShield Manager, run the show tech-support save session command, and search for “5671” to ensure that all hosts are connected to vShield Manager.
      2. Check Firewall functionality via Telnet or netcat to confirm the edge firewalls are working as expected.

Pre-Upgrade Steps

Immediately before you begin the upgrade, do the following:

  1. Verify that you have a current backup of the vShield Manager, vCenter and other vCloud Networking and Security components. See Appendix B for the necessary steps to accomplish this.
  2. Purge old logs from the vShield Manager “Purge log Manager” and “purge log system”
  3. Take a snapshot of the vShield Manager, including its virtual memory.
  4. Take a backup of the vDS
  5. Create a Tech Support Bundle.
  6. Record Segment ID’s and Multicast address ranges in use
  7. Increase the memory on the vShield Manager to 16GB and 4 vCPU.
  8. Ensure that forward and reverse domain name resolution is working, using the nslookup command.
  9. If VUM is in use in the environment, ensure that the bypassVumEnabled flag is set to true in vCenter. This setting configures the EAM to install the VIBs directly to the ESXi hosts even when the VUM is installed and/or not available.
  10. Download and stage the upgrade bundle, validate with md5sum.
  11. Do not power down or delete any vCloud Networking and Security components or appliances before instructed to do so.
  12. VMware recommends to do the upgrade work in a maintenance window as defined by your company.

For more information in this series please continue on to the next part

VMware VMworld 2016 Content Preview Video – The…

VMware VMworld 2016 Content Preview Video – The Future of Network Virtualization

VMware VMworld 2016 Content Preview Video – The…

Check out the highlights from one of our VMworld 2015 star sessions – “The Future of Network Virtualization with VMware NSX”. Join us at VMworld 2016 to continue the conversation with CTO Bruce Davie as he presents cutting-edge content in “The Architectual Future of Network Virutalization” [NET8193R]. Now that you’ve seen the highlights, want to see the entire video?


VMware Social Media Advocacy

vSphere Authentication Proxy – No Support for 2012/2012R2 Domains

After checking this internally it turns out that vSphere Authentication is in fact supported with 2012 and 2012R2 functional levels and the KB will be updated. 

Surprisingly vSphere Authentication Proxy currently has no support for domains that have a functional level of 2012 or 2012R2.

The product will install correctly and register the CAM account but will be unable to authenticate the ESXi host with the domain.

 

Domain Functional Level
Windows 2000 native Windows Server 2003 Windows Server 2008 Windows Server 2008 R2 Windows Server 2012 Windows Server 2012 R2
ESXi Version 4.x Yes Yes Yes No No 1,2 No 1,2
5.0 Yes Yes Yes Yes Yes 2,3 No 1,2
5.1 No Yes Yes Yes Yes 2 Yes 2,4
5.5 No Yes Yes Yes Yes 2 Yes 2,5
6.0 No Yes Yes Yes Yes 2 Yes 2

 

Notes:
  1. Due to the most recent revisions of ESXi having been released before the release of the Domain Functionality level at the time of this article’s writing, the ESXi version is untested to run on the Active Directory Domain Functionality Level.
  2. Due to limitations in the vSphere Authentication Proxy, this version of Active Directory will not work. vSphere Authentication Proxy will only work with Windows Server 2008 R2 or lower. For more information, see Install or Upgrade vSphere Authentication Proxy section in the vSphere Installation Guide. If you are not using vSphere Authentication Proxy, this may be ignored.
  3. As of vSphere 5.0 Update 3, this Active Directory Domain Functionality Level is now supported.
  4. As of vSphere 5.1 Update 3, this Active Directory Domain Functionality Level is now supported.
  5. As of vSphere 5.5 Update 1, this Active Directory Domain Functionality Level is now supported.

Source: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113023

 

SRM 6.1.1 Released

What’s New in Site Recovery Manager 6.1.1

So SRM 6.1.1 got released yesterday so I thought I would share with you what’s new!

VMware Site Recovery Manager 6.1.1 delivers new bug fixes described in the Resolved Issues section.

VMware Site Recovery Manager 6.1.1 provides the following new features:

  • Support for VMware vSphere 6.0 update 2.
  • Support for Two-factor authentication with RSA SecurID for vCenter Server 6.0U2.
  • Support for Smart Card (Common Access Card) authentication for vCenter Server 6.0U2.
  • Site Recovery Manager 6.1.1 now supports the following external databases:
    • Microsoft SQL Server 2012 Service Pack 3
    • Microsoft SQL Server 2014 Service Pack 1
  • Adding localization support for Spanish language.

See the release notes for further information.

 

 

NSX Design & Deploy – Course Notes

So I have just finished a week in Munich on an internal NSX design & deploy course which had some great content. I have made a few notes and wanted to very briefly blog those, mainly for my benefit but also in the hope they may be useful or interesting to others as well.

In no particular order:

  • vSphere 5.5 – When upgrading the vDS from 5.1 to 5.5 you need to upgrade vDS to Advanced mode after you do the vDS upgrade. This enables features such as LACP and NIOC v3 but is a requirement for NSX to function correctly.
  • When considering Multicast mode with VTEPs in different subnets be aware that PIM is not included in the Cisco standard licence and requires advanced! PIM is a requirement when using multiple VTEPs in different subnets.
  • When using dynamic routing with a HA enabled DLR the total failover time could be up to 120 seconds
    • 15 sec dead time ( 6 sec min 9-12 recommended- 6 not recommended as its pinging every 3 seconds)
    • claim interfaces/ start services
    • ESXi update
    • route reconvergence
  • Consider setting a static route (summary route) with the correct weight which is lower than the dynamic route  so traffic isn’t disrupted during DLR failover
  • Different subnets for virtual and physical workloads lets you have 1 summary route for the virtual world. (for the above event – this is likely not possible in a brownfield deployment)
  • Note during a DLR failover it looses the routing adjacency – the routing table entry upstream is removed during this period – which looses the route – this could be avoided by increasing the OSPF timers upstream so they won’t lose the route.
  • If you fail to spread the transport zones across all the applicable hosts that are part of the same vDS -the port groups will exist but won’t be able to communicate (No traffic flow – if someone accidentally selects that port group)

Part two coming soon!

Don’t forget the logs

Scratch Partition

A design consideration when using ESXi hosts which do not have local storage or a SD/USB device under 5.2GB is where should I place my scratch location?

You may be aware that if you do use Autodeploy, a SD or USB device to boot your hosts then the scratch partition runs on the Ramdisk which is lost after a host reboot for obvious reasons as its non-persistent.

Consider configuring a dedicated NFS or VMFS volume to store the scratch logs, making sure it has the IO characteristics to handle this. You also need to create a separate folder for each ESXi host.

The scratch log location can be specified by configuring the following ESXi advanced setting.

Screen Shot 2016-02-16 at 11.21.33

 

 

Syslog Server

Specifying a remote syslog server allows ESXi to ship its logs to a remote location, so it can help mitigate some of the issues we have just discussed above, however the location of the syslog server is just as important as remembering to configure it in the first place.

You don’t want to place your syslog server in the same cluster as the ESXi hosts your configuring it on or on the same primary storage for that matter, consider placing it inside your management cluster or on another cluster altogether.

The syslog location can be specified by configuring the following ESXi advanced setting.

Screen Shot 2016-02-16 at 11.24.44

Disabling vSphere vDS Network Rollback

There are certain situations when temporarily disabling network rollback might be a good idea,  notice the use of the word temporarily as it’s not something you would want to leave disabled.

Let’s say you need to change the VLAN configuration on a network port which is used for vSphere management traffic from an access port to a trunk port. Before doing this you would need to change the vDS portgroup to tag the management traffic VLAN, however in doing this the host would lose management network access until the correct configuration was applied on the switch port.

The vDS would detect this change and attempt to rollback the configuration to untagged, obviously we don’t want this to happen in this instance so VMware allow us to disable this rollback feature.

We can do this by configuring the following settings.

  1. In the vSphere Web Client, navigate to a vCenter Server Instance.
  2. On the Manage tab, click Settings.
  3. Select Advanced Settings and click Edit.
  4. Select the config.vpxd.network.rollback key, and change the value to false. If the key is not present, you can add it and set the value to false.
  5. Click OK.
  6. Restart the vCenter Server to apply the changes.

Remember to revert the change once you are finished.