I’ll cover a recent design decision we had to make on whether or not to inject a default route from your on-premises network into VMConAWS.
This may at first glance not sound like a big deal however depending on the customer’s topology and footprint i.e. if they have existing on-premises locations then it could be something you need to consider carefully.
For example egress costs for internet connectivity directly out of AWS are charged at a higher rate then egressing across a direct connect connection. Therefore if your customer already has an on-premises presence and infrastructure in-place it may well be more cost-effective to route and egress out of that instead.
The internet breakout within VMConAWS via the IGW is also unfiltered and not inspected, we only have the NSX L4 firewalls to protect us. If you did want to egress directly out of AWS you would need to stand up a transit VPC and deploy your own Layer 7 Next-Gen firewalls to inspect that traffic for you.
We also have an AWS limitation of only being able to receive 100 routes from your network in to AWS via BGP depending on the scale and topology of your network trying to summarise that network may be very difficult and may well push you over the 100 routes. If you do exceed the 100 route limit the BGP session will be terminated and connectivity lost over the direct connect so it is something you need to design against.
This particular customer has already invested in standing up an on-premises SDDC as well as the VMConAWS SDDC’s. This meant from a connectivity perspective they had already invested in L7 firewalls and security appliances for internet connectivity. Which meant in the immediate term we would route traffic across the direct connect and out of the on-premises internet breakout. This could always be revisited in the future as the VMC instance outgrows the on-premises deployments.
The above diagram covers how this would look from a traffic flow perspective, with internet traffic originating in VMC being routed over the direct connect and out of the on-premises egress point.
However; in making that design decision it raised a question around how this might affect VPN connectivity this is especially important if you want to use a Route Based IPSEC VPN as a back connectivity method. You have to understand that internet connectivity is terminated on the IGW (Internet Gateway) inside the shadow VPC and not directly on the Tier-0 (That looks to be a future release item). Therefore if we receive a default route from on-premises we would not be able to route out to the IGW from the Tier-0 for VPN connectivity as all traffic would be sent down the direct connect.
VMware has a solution for this little issue and that is to inject a /32 route into the Tier-0 for the destination VTI (Virtual Tunnel Interface). As an example to get to the destination VTI1 which is on-premises go via the Tier-0 —> IGW rather than over say the direct connect where the default route would push you.
I must admit to not knowing about this until I stumbled upon it vSphere 6.7 has added a new feature called QuickBoot.
Quick Boot enables the ESXi host to restart straight into the ESXi load screen avoiding the initial POST and memory test screens. This is particularly cool when you are doing ESXi updates as the reboot time is dramatically shortened as shown below.
So how does it work?
When a reboot is necessary, devices are shut down and the hypervisor restarts
Hardware initialization and memory tests are not performed
Improves host availability and shortens maintenance windows
Supported server hardware (current: shortlist of Dell and HPE systems)
Native device drivers only – no vmklinux driver support
Secure boot not supported
How to enable it?
Quick Boot is primarily intended to benefit Update Manager workflows, there are stringent checks for compatibility, according to the following files
I have just recently completed an SDDC deployment with my good friend @simoneady and while deploying NSX-V 6.4.5 we noticed we couldn’t ping any VXLAN backed networks which were connected to the newly deployed DLR.
It later transpired to be a firewall issue on the DLR and after some digging, we were able to pinpoint this to the options selected during the deployment wizard. Let me explain further;
If you log in to NSX via the HTML5 UI and create a DLR or ESG with the HTML5 wizard like below.
Continue on to the next page.
Leave auto rule generation ticked which is the default option. Now, this is where it gets interesting and let me explain why; I would normally enable the firewall on the below screen, set the default rule to accept then set the firewall back to disabled.
The rationale behind this is if someone later on accidentally enabled the firewall as the default rule is set to accept it wouldn’t create an outage. However, it appears if you do this within the HTML5 UI once you toggle the enable button it will always stay enabled and more importantly it ignores whether you set it to accept so all traffic is denied.
You can see why during deployment this may catch you out for a period of time as you disabled the firewall but in actual fact, it is not only enabled but enabled with a default deny.
Workflow is below…
What’s even more confusing is the firewall section didn’t make it into the HTML5 UI so on first glance you may assume it’s still disabled as it’s not visible. You need to open up the flex (Flash) UI to see the Edge Firewall section.
Note it does say Firewall configured
And bingo we can see the firewall is started with a default deny rule.
Hopefully, this may help others and I hope to raise an internal SR to make engineering aware of this issue as I do not believe its local to this customer.