Azure Upgrade and Fault domains
As website admin I noticed that this particular blog post receives a lot of views and as this blog has been written 2 years ago it was time to provide an update 🙂
Azure provides an SLA of 99,9 on single instance VM’s on gold storage. For VM’s on standard storage you can get an SLA of 99,95 when you place them in a availability set (you need at least 2).
So if you are worried about the uptime and availably of your Azure VM’s and the application that’s hosted on it you will have to do some planning to mitigate downtime. Fortunately Azure provides us the means to make this possible.
From time to time Azure needs to update the fabric, while this mostly doesn’t have any impact on the VMs running on the fabric there might be times that your VM needs to be rebooted to apply the updates on the platform. You will be notified weeks in advance on when this will occur.
This is for when ‘shit hits the fan’ scenario’s, things can get broken and even while Azure can heal itself, there are also cases when this is not possible. In this scenario your VM could go down without any warning.
How to prevent your VM from being unavailable?
When you deploy your VM, during the VM creation wizard (portal) or while constructing the PowerShell script, you can specify an availability set. VM’s in an availability set provide redundancy for your VM/Applications. In this case, when a VM goes down, there is still another VM that can handle the workload and you will be able to achieve the 99,95 SLA.
As soon as you place a VM in a availability set, the Azure platform will automatically assign a Upgrade and Fault domain.
By default, you have 5 non configurable Upgrade Domains by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time.
Fault domains define the group of virtual machines that share a common power source and network switch.
By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic).
While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.
The new Managed Disk feature will assure that your VM disks are also placed in fault domains. Previously the VM disks where not aligned with the FD from the VM and could lead to downtime due to the storage not being available. Depending on the region, you will have 2-3 FD available for your managed disks.
Without Managed Disks, you will need to create different Storage Accounts:
1.Keep all disks (OS and data) associated with a VM in the same storage account
2.Use separate storage account for each VM in an Availability Set. Multiple VMs in the same availability set must NOT share storage accounts. It is acceptable for VMs across different Availability Sets to share storage accounts as long as the preceding best practices are followed
If you have an application in multiple tiers, its recommended that you place each tier in a different AV Set.
If you place two different tiers in the same availability set, all virtual machines in the same application tier can be rebooted at once.
By configuring at least two virtual machines in an availability set for each tier, you guarantee that at least one virtual machine in each tier is available.
Combine the Azure Load Balancer with an availability set to get the most application resiliency. The Azure Load Balancer distributes traffic between multiple virtual machines.
If the load balancer is not configured to balance traffic across multiple virtual machines, then any planned maintenance event affects the only traffic-serving virtual machine, causing an outage to your application tier. Placing multiple virtual machines of the same tier under the same load balancer and availability set enables traffic to be continuously served by at least one instance