VMware Announces General Availability of…

VMware Announces General Availability of vSphere 6.5 Update 1

VMware Announces General Availability of…

vSphere 6.5 Update 1 is the Update You’ve Been Looking For! Today, VMware is excited to announce the general availability of vSphere 6.5 Update 1. This is the first major update to the well-received vSphere 6.5 that was released in November of 2016. With this update release, VMware builds upon the already robust industry-leading virtualization The post VMware Announces General Availability of vSphere 6.5 Update 1 appeared first on VMware vSphere Blog .


VMware Social Media Advocacy

vRA 7 & NSX 6.3 – The Security Tag Gotcha!

Lets assume you’re wishing to deploy a vRA 7.x blueprint into an environment where NSX 6.3.x has been deployed, and the DFW default rule is set to deny. During the provisioning of the vRA VMs they will of course need firewall access for services such as Active Directory and DNS to allow them to customise successfully, and here in lies the problem.

You might assume you could go about creating your security policies and security groups as normal and simply include the security tag within the blueprint to grant access to these services. However; vRA won’t assign the security tag until after the machine has finished customizing. So that creates us a potential issue as the VM won’t have access to the applicable network resources such as AD & DNS to finish customizing successfully as the default DFW is set to deny.

So to design around this you need to consider having some shared services rules at the top of the DFW rule table which allow services such as Active Directory and DNS access to these VMs, this will allow the vRA VMs to have the necessary network access to finish deploying & customizing successfully. You could achieve is in a number of ways such as creating a security group based on OS name of “Windows” and VM Name that equals the name of your vRA VM’s. Therefore as soon as vRA creates the VM object it will be assigned to the correct shared services security group and given the correct access, you can then layer in additional services using security tags as originally intended.

vSphere 6.5 PSC’s – Multisite SSO Domains & Failures

Since vSphere 6.5 came along we have deprecated a PSC/SSO topology that I have seen customers deploy quite frequently, therefore I wanted to quickly explain what has changed and why.

Since vSphere 6.5 it is no longer possible to re-point a vCenter server between SSO sites – See VMware KB 2131191

This has a big knock on effect for some customers as traditionally they only deployed a single PSC (Externally) and a single vCenter at each site. In the event of a failure of the PSC at site 1 the vCenter server at site 1 could be re-pointed to the PSC in Site 2. This would then get you back up and running until you recovered the failed PSC.

However in vSphere 6.5 this is no longer possible – If you have two SSO sites lets call them Site 1 and Site 2 you cannot re-point a vCenter server between the two sites. So if you only had a single PSC (think back to the above example and vSphere 6.0) you would not be able to recover that site successfully.

So how do I get round this?

Deploy two PSC’s at site and within the same SSO site/domain (See the below diagram) you don’t even need to use a load balancer, as you can manually re-point your vCenter to the other PSC running at the same site in the event of a failure.

Therefore when deploying multiple SSO sites and PSC’s across physical sites consider the above and make sure you are deploying a supported and recoverable topology.

For a list of supported topologies in vSphere 6.5 and for further reading please see VMware KB 2147672

 

vRealize Log Insight – Tracking SSH Logins to Edge Devices

Here is an example of a really simple but cool query that can be setup in vRealize Log Insight to track accepted and failed SSH logins to Edge devices.

Query:

Match ALL:

appname contains “sshd”

text contains “failed password” (This can be changed to “accepted password” to track accepted logins)

hostname contains “hostname”

 

vDS is NOT a pre-requisite for NSX Guest Introspection

vDS is NOT a pre-requisite for NSX Guest Introspection

This seems to be a common misconception both from customers and third party vendors, the vDS is also not licensed as part of NSX in the “NSX for Endpoint” license. Therefore a normal VSS standard vSwitch is fully supported for deploying NSX Guest Introspection to use with third party vendors i.e. for Anti Virus.

Please refer to the NSX 6.x Installation Guide on how to use the “Specified on host” option when deploying Guest Introspection

Extract from the NSX 6.x Installation Guide Below…

 

NSX Troubleshooting Commands

Initial Troubleshooting – Start with the basics in troubleshooting – Transport Network and Control Plane

Identifying Controller Deployment Issues:

  • View the vCenter Tasks and Events logs during deployment.

Verify connectivity from NSX Manager to vCenter:

  • Run command ping x.x.x.x or show arp or show ip route or debug connection <IP-or hostname> on NSX Manager to verify network connectivity.
  • On NSX Manager run the command show log follow then connect to vCenter and look in log for connection failure.
  • Verify network configuration on NSX Manager by running command show running-config.

Identify EAM common issues:

  • Check vSphere ESXi Agent Manager for errors:
  1. vCenter home > vCenter Solutions Manager >vSphere ESX Agent Manager.
  2. Check status of Agencies prefixed with _VCNS_153.
  3. Log into the ESXi host as root that is experiencing the issue and run command tail /var/log/esxupdate.log.

NOTE: You can also access the Managed Object Browser by accessing address:
https://<VCIP or hostname>/eam/mob/

UWA’s – vsfwd or netcpa – not functioning correctly? This manifest itself as firewall showing a bad status or the control plane between hypervisor(s) and the controllers being down.

  • To find fault on the messaging infrastructure, check if Messaging Infrastructure is down on host in NSX Manager System Events.
  • More than one ESXi host affected? Check message bus service on NSX Manager Appliance web UI under the Summary Tab.
  • If RabbitMQ is stopped, then restart it.

Common Deployment Issues:

1. Connecting NSX to vCenter

  • DNS/NTP incorrectly configured on NSX Manager and vCenter.
  • User account without vCenter Administrator role used to connect NSX Manager to vCenter.
  • No network connectivity between NSX Manager and vCenter Server.
  • User logging into vCenter with an account with no role on NSX Manager.

2. Controller Deployment

  • Insufficient resources, available to host controllers.
  • NTP on ESXi hosts and NSX Manager not in sync.
  • IP connectivity issues between NSX Manager and the NSX Controllers.

3. Host Preparation

  • EAM fails to deploy VIBs because of mis-configured DNS on ESXi hosts.
  • EAM fails to deploy VIBs because of firewall blocking required ports between ESXi, NSX Manager and vCenter.
  • A Previous VIB of an older version is already installed requiring user intervention to reboot the hosts.

4. VXLAN

  • Incorrect teaming method selected for VXLAN.
  • VTEPs does not have connectivity between each other.
  • DHCP selected to assign VTEP IPs but unavailable.
  • If a vmknic has a bad IP, configure manually via vCenter.
  • MTU on Transport Network not set to 1600 bytes or greater.

CLI

NSX Controller CLI VXLAN Commands:

  • show control-cluster logical-switches vni <vni>
  • show control-cluster logical-switches connection-table <vni>
  • show control-cluster logical-switches vtep-table <vni>
  • show control-cluster logical-switches mac-table <vni>
  • show control-cluster logical-switches arp-table <vni>
  • show control-cluster logical-switches vni-stats <vni>

NSX Controller CLI cluster status and health:

  • show control-cluster status
  • show control-cluster startup-nodes
  • show control-cluster roles
  • show control-cluster connections
  • show control-cluster core stats
  • show network <arg>
  • show log cloudnet/cloudnet_java-vnet-controller. <start-time-stamp>.log
  • sync control-cluster <arg>

VXLAN namespace for esxcli:

  • esxcli network vswitch dvs vmware vxlan list
  • esxcli network vswitch dvs vmware vxlan network list –vds-name=<vds>
  • esxcli network vswitch dvs vmware vxlan network mac list –vds-name =<vds> –vxlan-id=<vni>
  • esxcli network vswitch dvs vmware vxlan network arp list –vds-name –vxlan-id=
  • esxcli network vswitch dvs vmware vxlan network port list –vds-name <vds> –vxlan-id=<vni>
  • esxcli network vswitch dvs vmware vxlan network stats list –vds-name <vds> –vxlan-id=<vni>

Troubleshooting Components – Understand the component interactions to narrow the problem focus

Controller Issues:

No connectivity for new VM’s, increased BUM traffic (ARP cache misses).

General NSX Controller troubleshooting steps:

  • Verify Controller cluster status and roles.
  • Verify Controller node network connectivity.
  • Check Controller API service.
  • Validate VXLAN and Logical Router mapping table entries to ensure they are consistent.
  • Review source and destination netcpa logs and CLI to determine control plane connectivity issues between ESXi hosts and NSX Controller.

Example:

Verify VTEPs have sent network information to Controllers.

On Controller:

  • show control-cluster logical-switches vni <vni.no>
  • show control-cluster logical-switches vtep-table <vni.no>

User World Agent (UWA) issues:

  • Logical switching not functioning (netcpa)
  • Firewall rules not being updated (vsfwd).

General UWA troubleshooting steps:

  • Start UWA if not running.

/etc/init.d/netcpad [status/start]

/etc/init.d/vShield-Stateful-Firewall [status/start]
  • tail netcpa logs: /var/log/netcpa.log
  • tail vsfwd logs: /var/log/vsfwd.log

Check if UWA’s are connected to NSX Manager and Controllers.

esxcli network ip connection list |grep 5671 (Message bus TCP connection)

esxcli network ip connection list |grep 1234 (Controller TCP connection)

Check the configuration file /etc/vmware/netcpa/config-by-vsm.xml on the ESXi host that has the settings under UserVars/Rmq* (In particular UserVars/RmqipAddress).

The list of UserVars needed for the message bus currently are:

a. RmqClientPeerName

b. RmqHostId

c. RmqClientResponseQueue

d. RmqClientExchange

e. RmqSslCertSha1ThumbprintBase64

f. RmqHostVer

g. RmqClientId

h. RmqClientToken

i. RmqClientRequestQueue

j. RmqVsmExchange

k. RmqPort

l. RmqVsmRequestQueue

m. RmqVHost

n. RmqPassword

o. RmqUserna

p. RmqIpAddress

NVS Issues – Limited/Intermittent connectivity for VM’s on the same Logical switch.

General VXLAN troubleshooting steps:

  • Check for incorrect MTU on Physical Network for VXLAN traffic.
  • Incorrect IP route configured during VXLAN configuration.
  • Physical network connectivity issues.

Verify connectivity between VTEPs:

  • ping from VXLAN dedicated TCP/IP stack ping ++netstack=vxlan -l vmk1 <ip address>
  • View ARP table of VXLAN dedicated TCP/IP stack esxcli network ip route ipv4 list -N vxlan
  • Ping succeeds between VM’s but TCP seems intermittent. Possible reason is due to incorrect MTU on physical network or VDS on vSphere. Check the MTU configured on VDS in use by VXLAN.

Verify VXLAN component:

  • Verify VXLAN vib is installed and the correct version esxcli software vib get -vibname esx-vxlan
  • Verify VXLAN kernel module vdl2 is loaded vmkload_mod -l |grep vdl2 (logs to /var/log/vmkernel.log – prefix VXLAN)
  • Verify Control Plane is up and active for Logical Switch on the ESXi host esxcli network vswitch dvs vmware vxlan network list –vds-name
  • Verify VM information. Verify that ESXi host has learned MAC addresses of remote VM’s esxcli network vswitch dvs vmware vxlan network mac list –vds-name
  • List active ports and VTEP they are mapped to esxcli network vswitch dvs vmware vxlan network port list –vds-name –vxlan-id=
  • Verify host has locally cached ARP entry for remote VM’s esxcli network vswitch dvs vmware vxlan network arp list –vds-name –vxlan-id=

Distributed Firewall issues – Flow Monitoring provides vNIC level visibility of VM traffic flow.

  • Detailed Flow Data for both Allow and Block flows.
  • Global flow collection disabled by default – click the Enable button.
  • By default NSX excludes own VMs: NSX Manager, Controllers and Edges.

Note: Add a VM to the Exclusion List to remove it from the DFW. This allows you to determine if it’s a DFW issue. If there is still a problem, then it is not DFW-related.

  • In vSphere Web client, go to Network and Security and select the NSX Manager. From Manage > Exclusion List, click the + and select the VM.
  • Verify the DFW vib (esx-vsip) is installed on the ESXi host esxcli software vib list |grep vsip
  • Verify that the DFW kernel module is installed on the ESXi host vmkload_mod -l |grep vsip
  • Verify the vsfwd service daemon is running on the ESXi host ps |grep vsfwd
  • To start/stop the vsfwd daemon on the ESXi host /etc/init.d/vShield-stateful-Firewall[stop|start|status|restart]

LOGS

NSX Manager Log (collected via WEB UI)

  • Select Download Tech Support Log

Edge (VDR/ESG) Log (collected via WEB UI)

  • From the vSphere Web Client, right click vCenter Server, select All vCenter Actions -> Export System Logs.
  • You can also generate ESXi host logs on the ESXi host cli: vm-support.

NSX Controller Logs

  • From the vSphere Web Client, go to Networking & Security -> Installation -> NSX Conroller Nodes.
  • Select the Controller and select Download Tech Support logs.

Issues and Corresponding Logs

Installation/upgrade related issues

  • NSX Manager Log.
  • vCenter Support Bundle: /var/log/vmware/vpx/EAM.log and /var/log/esxupdate.log.

VDR issues

  • NSX Manager log.
  • VDR log from the affected VDR.
  • VM support bundle: var/log/netcpa.log and /var/log/vsfwd.log.
  • Controller logs.

Edge Services Gateway issues

  • NSX Manager log.
  • Edge log for the affected ESG.

NSX Manager issues

  • NSX Manager log.

VXLAN/Controller/Logical Switch

  • NSX Manager log.
  • vCenter Support bundle.
  • VM support bundle.

VXLAN data plane: /var/log/vmkernel.log.

VXLAN control plane: /var/log/netcpa.log.

Management plane: /var/log/vsfwd.log and /var/log/netcpa.log.

Distributed Firewall (DFW) issues

  • NSX Manager log.
  • VM support bundle: /var/log/vsfwd.log, /var/log/vmkernel.log.
  • VC support bundle.

Implementing a multi-tenant networking platform…

Implementing a multi-tenant networking platform with NSX

Implementing a multi-tenant networking platform…

So we have covered the typical challenges of a multi-tenant network and designed a solution to one of these, it’s time to get down to the bones of it and do some configuration! Let’s implement it in the lab, I have set up an NSX ESG Cust_1-ESG and an NSX DLR control VM Cust_1-DLR with the below IP configuration:


VMware Social Media Advocacy