There’s some speculation in here, but first up, some concrete notes learned from deploying SDN 2016:
Ok, so when Azure Stack ships it will be as a turnkey appliance – a black box of wonder that arrives at your datacentre, gets racked, powered on and immediately delivers the glory of Azure to you locally. Right?
As with any appliance, there are backend integrations which will need to be done – identity, billing and chargeback, integration into existing network infrastructure, that sort of thing. It’s on the latter which we’ve been working on recently, and for which we have some useful lessons learned to share.
While multi-node Azure Stack infrastructures aren’t available for most to work on this integration piece yet and the 1-node Azure Stack implementation hides behind a BGPNAT VM, we do still have options for making sure we’re as well prepared as possible.
Specifically, the Software Defined Networking implementation in Azure Stack is the Azure-inspired SDN which is also in Windows Server 2016, meaning that if we can deploy the end to end SDN stack in a Hyper-V 2016 cluster, in theory much of the required physical network config from above the TOR level should be identical in Azure Stack.
Regardless, Hyper-V 2016 has a long and illustrious future ahead of it as a resilient, cost-effective, and ultra-secure IaaS platform which can be managed through the Azure Stack portal in its own right, so having consistent SDN implemented in Hyper-V 2016 isn’t a nice to have, for me it’s an absolute necessity.
This blog doesn’t seek to document the step by step process for deploying SDN, but rather to showcase a few of the lessons learned encountered along the way which can help inform the integration of Azure Stack into an existing physical infrastructure.
There is a plethora of documentation available for SDN, and while it can at first glance be a little overwhelming, it’s really important to really read and understand the entire documentation set before proceeding with deployment.
This is the documentation set we have followed in order to deploy SDN successfully on a Hyper-V 2016 cluster managed by VMM 2016.
While we’re deploying SDN in a clustered environment, this implementation is really useful to read through as it has step by step screenshots which are extremely useful to reference through deployment.
This is one of the most important pages to reference in an SDN deployment. Pretty much every single cmdlet and test documented therein is valuable in understanding the status of your SDN deployment and figuring out where any errors lie.
There are a number of useful resources in the SDN GitHub repo, in particular the example SwitchConfigExamples and the Diagnostics scripts are invaluable.
Lesson the first…
The SDN documentation includes a good amount of information about how to configure to a TOR level and how traffic flows within there, how you integrate this into your existing physical network will vary significantly though depending on what hardware you have and how it’s set up.
Public Azure Stack documentation to date uses the above image to show how separate cluster fault domains will connect through their TORs and AGGs, but naturally doesn’t go into any detail above that level.
Typically above this level we would find a set of hardware firewalls, and from there a series of core through to edge network devices. We questioned early on though the firewall placement in this scenario, the thought process being that tenant traffic would benefit from bypassing the physical firewall and making use of the SDN distributed firewall. This removes typical firewall bottlenecks, and enables the full automation power of the SDN infrastructure from Hyper-V switch to edge.
Management traffic is still critical to traditionally secure however, so our implementation splits management and tenant traffic out via VRFs.
Per host, Management and SDN traffic are run through a pair of 10Gbps Mellanox cards, while SMB/RDMA storage traffic is split out onto separate Chelsio NICs. Expanding the above image, it starts then to look more like this – yikes! This is not a pretty image, but it’s accurate.
Public VIPs route via core network over the Tenant VRF, while Private VIPs route via AGG/Firewalls, and all works joyously. BGP is in place from RRAS/SLB to ToR, then OSPF for Tenant traffic to the core and out to the edge.
Is this how it’ll be in Azure Stack? I don’t know! One thing’s for sure, learning lessons on how to integrate SDN 2016 into your physical network now can only benefit your Azure Stack deployment in the future.