Immutable Backups with AWS Backup - jrott

Immutable Backups with AWS Backup

Implementing AWS backup allows us to create immutable backups in AWS. It also provides us with a bunch of tools for testing backups and making sure our restores are working properly. You can also use it for sharing backups across regions and across accounts though there will be issues to deal with especially with KMS and IAM to make that work.

One of the key reasons that I wanted to implement this was as protection against ransomware attacks. Of course having backups that are immutable is also useful for a host of other reasons. The biggest of those is preventing accidental deletion of backups. Being able to systematically prevent categories of mistakes is important for making systems more reliable since it’s impossible to totally remove humans from the loop in the disaster recovery process.

Another critical feature of AWS Backup for making disaster recovery easier is the ability to create policies and automated tests for your backups. This has two important effects. First, having policies that are automated into code makes guaranteeing compliance much easier. If you need to keep monthly backups in cold storage for a period of three years you can just set up a policy that does that. Doing this means that instead of having to remember to do something or creating your own version of this you can just declare it declaratively like most other infrastructure as code.

Automated testing is another big part of making sure backups are working. There is a reason for the old sysadmin wisdom about a backup not being a backup until you’ve successfully restored it. Unfortunately this is one of the things that turns out to be done less in practice then it should be. Automated testing of these backups can be built using lambdas and following some guidance from AWS sadly though it’s not fully built in.

We can also create policies around sharing snapshots between accounts or to guarantee that all accounts in our AWS organization have a backup policy. This can be useful especially when we are dealing with complex organizations and are trying to enable teams to do operations work on their own. Oftentimes when that is the case the most important thing is making it so that teams can easily do the right thing.

Now that we have covered why we want to use AWS backup let’s get into the basics of getting going with it to do that we’ll cover three things

  1. Creating a backup vault
  2. Creating a backup plan
  3. Creating a vault lock to make the backups immutable

Creating a Backup Vault

The first step to using AWS backup is creating a backup vault. Your account will have one by default however that usually doesn’t make sense as what you want to use for your backups. Each vault will have a KMS key that is used for encrypting backups inside the vault. Vaults will allow us to create policies and locks that are on top of the vault. However to start we will just create a vault using terraform

resource "aws_kms_key" "example" {
  description = "example KMS key for backup vault. NOTE needs a policy if you are doing this for real"
}


resource "aws_backup_vault" "example" {
  name        = "example_backup_vault"
  kms_key_arn = aws_kms_key.example.arn
}

The kms_key_arn is actually optional and if we don’t set it the vault will be created with the default kms key being used for backups. By itself having the backup vault alone does not do much for us. This is because we are lacking an IAM role which can create the backups. The other major issue we have with our vault is currently anyone can access it and delete the backups. Once we have a backup vault we need an IAM role that AWS backup can assume to create backups. In practice you’ll probably want to create a tighter IAM policy than this but for the purposes of this post I’m going to use the default AWS IAM Policy AWSBackupServiceRolePolicyForBackup. To create the IAM role and attach the policy we will use the following terraform

data "aws_iam_policy" "backup_policy" {
  arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
}

resource "aws_iam_role" "backup_service_role" {
  name = "backup_service_role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "backup.amazonaws.com"
        }
      },
    ]
  })
}

Then finally we attach the role to the policy with the following

resource "aws_iam_role_policy_attachment" "backup-attach" {
  role       = aws_iam_role.backup_service_role.name
  policy_arn = data.aws_iam_policy.backup_policy.arn
}

Once we have created our backup policy and vault we can test out creating a manual snapshot. This step is mostly to prove that we have created the IAM role and vault correctly. Feel free to skip ahead to the next step after this if you want. Since this is just a quick validation test I’m going to do this manually. First we log into the aws console and go find AWS backup. Then in the dashboard we’re going to hit the create manual backup button.

At this point we fill in the form and choose our IAM role instead of using the default IAM role. We also need to choose the service that we want to backup. After doing this the job should kick off or we will see an error with our IAM role which will indicate that we need to change our permissions.

Making Backup Plans

Manual backups are well and good but if you want to actually keep doing backups over the long term it makes much more sense to make sure the backups are automated. Creating a plan will allow us to automate the backups on a schedule.

For the purposes of this example let’s say that we want to create a plan that will backup our resources daily. We also want our backups moved to cold storage after 30 days and to be deleted after a year. To do that we will create the following backup plan:

resource "aws_backup_plan" "example" {
  name = "example_backup_plan"

  rule {
    rule_name         = "example_backup_rule"
    target_vault_name = aws_backup_vault.example.name
    schedule          = "cron(0 0 * * ? *)"

    lifecycle {
      cold_storage_after = 30
      delete_after       = 365
    }
  }
}

Now that we have a backup plan we need to make sure that there are resources that are assigned to the backup plan. By default our plan has no resources assigned to it. All it does is create a rule about how backups are to be performed. For this to be useful we need to assign resources. This can be done either directly with the resource ARN or through the use of tags. To associate resources with a backup plan we need to use an aws_backup_selection. For the purposes of this example we will do the selection with a tag but to do it with a resource you just need to change to resource = [“arn”...] instead of the selection tag.

resource "aws_backup_selection" "example" {
  iam_role_arn = aws_iam_role.backup_service_role.arn
  name         = "example_backup_selection"
  plan_id      = aws_backup_plan.example.id

  selection_tag {
    type  = "STRINGEQUALS"
    key   = "backup"
    value = "True"
  }
}

Governance and Making Backups Immutable

Now that we are backing up our resources it is time to look at how to make it so that resources either can’t be tampered with or can only be tampered with by known parties. I’m going to start this with a word of warning. If you set up a legal lock following this guide snapshots can get very expensive because no one can delete them. Now that you’ve been warned and won’t go running terraform blindly and not deleting it right away let’s dive into how to do this.

There are a couple of options for denying access to backups and preventing deletion. They are vault policies which are more tunable but can make compliance more complicated and vault locks which are much simpler to work with.

Two different flavors of vault locks exist: governance locks and legal locks. In practice this isn’t really about governance vs legal proceedings; instead an admin can remove a governance lock whereas no one can remove a legal lock after its initial creation period.

Governance and legal locks are both created with the same terraform resource. We’ll start with making a governance lock and then I’ll show how to change your terraform to make it a legal lock.

Governance Lock

resource "aws_backup_vault_lock_configuration" "test" {
  backup_vault_name   = aws_backup_vault.example.name
  max_retention_days  = 365
  min_retention_days  = 7
}

A quick note here our plans length for retaining our data needs to be inside the window of our vault lock or the backups will start failing. With that finished we can switch over to how to create a legal lock which will look the same for the most part.

Legal Lock actually immutable backups If you leave this as is for more then the 3 days it will get expensive from retaining a years worth of backups

resource "aws_backup_vault_lock_configuration" "test" {
  backup_vault_name   = aws_backup_vault.example.name
  changeable_for_days = 3
  max_retention_days  = 365
  min_retention_days  = 7
}

The final thing that we need to talk about when it comes to securing vaults is vault policies. Policies are basically IAM policies that can be attached to an AWS vault to restrict access to it. This allows us to choose who can read and write to the vault. We could also use it to restrict access to deleting snapshots but that has all of the problems of a governance lock while being more complicated to create. Despite that we might want to create a vault policy so that only certain users or accounts can access our vault.

data "aws_iam_policy_document" "example" {
  statement {
    effect = "Allow"

    principals {
      type        = "AWS"
      identifiers = var.identifiers
    }

    actions = [
      "backup:DescribeBackupVault",
      "backup:DeleteBackupVault",
      "backup:PutBackupVaultAccessPolicy",
      "backup:DeleteBackupVaultAccessPolicy",
      "backup:GetBackupVaultAccessPolicy",
      "backup:StartBackupJob",
      "backup:GetBackupVaultNotifications",
      "backup:PutBackupVaultNotifications",
    ]

    resources = [aws_backup_vault.example.arn]
  }
}

resource "aws_backup_vault_policy" "example" {
  backup_vault_name = aws_backup_vault.example.name
  policy            = data.aws_iam_policy_document.example.json
}

Final Notes and Where To Go From Here

We’ve now talked about how to set up immutable backups in AWS and the process for creating them. From here there are quite a few areas left to talk about including the obvious ones like testing but also compliance for example using frameworks to ensure compliance across multiple accounts. The other major area that we haven’t touched on is cross region and cross account backups which are commonly done to make it so that restoration can happen somewhere unrelated to the main environment.

AWS Backup isn’t the only approach for handling disaster recovery on AWS. As always with AWS there are many ways to get to the same goal. This means that there alternative approaches you could consider such as shipping snapshots to s3 and then enabling worm on the S3 bucket. There are also many third party options for backups and it’s common to need or want one of them or to build your own especially if you are shipping backups to an alternative cloud provider.

Note if you want the code for this example you can find it here https://github.com/jrottersman/awsbackupguide.