How to keep your Azure infrastructure highly available - Configuring data redundancy by Tobias Quante

How to keep your Azure infrastructure highly available - Configuring data redundancy

Tobias Quante

Tobias Quante

This post is part of my 'Learning in public' journey for Microsoft Azure. We're exploring the topic of data redundancy as part of keeping a highly available software infrastructure

Table of contents

Data redundancy is synonymous with keeping data replicas. It plays a key part in highly available infrastructure by ensuring data is not lost, even if it is accidentally deleted, corrupted, or encrypted by malware. This article focuses on redundancy in storage accounts.

Resources this concept can be used on

All services that are part of an Azure storage account can be replicated. However, some strategies might not be available for lower-tier Storage Keeping Units (SKUs). Here are the current SKU types for storage accounts and their respective replication options.

Types of redundancy

Locally redundant storage

Locally redundant storage keeps your data on three drives in a single data center. It's an option for non-critical scenarios. The standard SKU can be used for all kinds of storage accounts and is available in most - if not all - regions

Premium LRS is available only for more specific types of storage, such as Page Blobs, Block Blocks, and File Shares. This SKU is optimized for low latency and fast IO-operations

Premium SKUs are generally only available for locally redundant storage and zone redundant storage, depending on the region they're located in

Zone redundant storage

Zone redundant storage replicates your data over three availability zones in a single region. It's a compromise between data security and low latencies. In a full availability zone outage, your data remains safely stored in the two others.

Its premium SKU may not be available in all regions.

Geo-redundant storage

If availability zones sound too risky, you might want to replicate your data worldwide. Geo-redundant storage replicates your locally redundant data to a second region. It protects against greater local outages affecting all three availability zones from ZRS.

Geo-redundant replication to the secondary region is handled asynchronously and the replicated data is not accessible unless there is a failover in the primary region. That is unless you configure RA-GRS.

With Read-Access Geo-redundant storage, data in the secondary become available for read-only access. This strategy is useful to provide low-latency access to replicated files, for example, if you have a branch office in the secondary region.

When to use what replication strategy

While there are only a handful of redundancy options, things can get complicated when combining them with account types. In the following, I'll name only a few useful combinations. What strategy you use highly depends on your company's demands.

Standard General PurposeV2 LRS

In many cases, LRS with a standard General PurposeV2 account will already suit your needs. It keeps three copies of your data in a single data center and ensures 99.99999999999% (11*9) data availability. While not IO-optimized, it provides a great latency in your chosen region and gives you plenty of options to store files and other data.

Use cases include:

  • Provisioning of standard VHDs for your company, for example, to share files on a virtual drive
  • Storing data of a virtual machine or a set of VMs
  • Storing default OS image files as page blobs
  • Storing non-regulatory and non-mission-critical files where a loss will not result in huge legal fees

Premium block blobs ZRS

In the age of large language models, data are the new gold. Training the models requires a lot of them. It's also demanding in terms of computing power.

Machine learning algorithms generally perform better with low latency. If you spend lots of time and money training models (or accessing them), this SKU will be your choice.

At the same time, ZRS or GRS will keep your models and training data safe. While the additional cost may seem over the top, eliminating the remaining probability of losing your model and its training data may be worth it.

Generally, use cases here include

  • Secure provisioning of mission-critical, unstructured data of large sizes
  • Low latency access to training data
  • Provisioning data for big-data analytics

Standard General PurposeV2 RA-GZRS

With the highest SLA Azure offers (99% with 16 9s after the decimal), GZRS with read-access solves many problems at once. Besides keeping your data safe from what must be several natural disasters happening at once, it helps you reduce latency for associates in the secondary region. This is especially useful for regulatory or compliance information needed in both locations, which will be instantly available even during an outage.

Use cases here might include

  • Provisioning of mission-critical legal, compliance- or government data
  • The most secure option for Long-time storage of cold-access tier data
Share this article