My automated backups to AWS S3 with client-side AES256

Published on  ~1400 words 7 min read

I'm very paranoid when it comes to my data. For me, the only backup solution that I could possibly use, must be simple and must have client-side AES256 encryption with the passphrase that only I would know.

Image credit: The bank vault in NoMad Downtown LA. Image by Benoit Linero.

What I was looking for. Requirements

The reason I didn’t just start using some backup software with built-in encryption is a pure lack of trust. So, if you’re fine with someone managing your backups, you might think the approach described here is a total overkill. And it’s okay. But there are some people out there who have the same trust issues that I do and they might find this article useful, so I write it for them. And also for myself, as a write-up what I did.

What are my personal requirements to a backup automation:

  • The storage back-end must be highly available
  • The storage back-end must have versioning
  • The client must not have permission to destroy or corrupt data
  • The client must be transparent and obvious in how it works
  • The client must be smart enough to synchronize backups — uploads changes only
  • The encryption of my backups must happen on my machine and should prompt for a passphrase during the process. It should be technically impossible for someone except me to access the backups.

I know, one can argue about the last one because this passphrase is a shared secret created by a human but I can assure you, nobody else knows my passphrase (unless somebody installs a key-logger software on my computer) and the passphrase is really strong (long, special characters, different casing, numbers, etc).

3-2-1 Backup Rule

There is a common rule of backup that says you must be:

  • Making at least 3 copies of data, located on physically different storage media;
  • Keeping these copies in no less than 2 different formats;
  • Always storing at least 1 of these backups off-site (e.g. on the commercial cloud account).

I don’t know where exactly this rule came from but you can definitely find some mentions on the internet (e.g here and here).

This rule makes sense to me, so I would like to follow this rule as much as I can.

So, for me it would be:

  • 3 copies of data: 1 on AWS, 1 on a mSD card, 1 on a USB thumb drive
  • 2 formats: I guess this is an exception for me, I store only AES256 encrypted TAR files
  • 1 off-site: 1 backup is on AWS, so technically it’s off-site of my laptop. Or I can say that my thumb drive or mSD card is off-site, whatever.

Before we talk about how I actually store backups, let’s talk about the encryption I use.

First Step: Encryption

Some time ago I wrote a bash script for encrypting a given directory. It’s very simple and is based on the GPG symmetric mode and AES256 algorithm. This script creates a TAR archive and then directs it to GPG that prompts for a passphrase and performs the encryption.

You can find the source code here on GitHub.

This script produces *.enc files that I can upload or store anywhere without fear that somebody can get access to the content.

For me, it’s the simplest and most reliable solution: GPG is a very reliable tool and I understand 100% what my script is doing because it’s so simple.

Second Step: Storage in S3

We’re not going to talk about my physical devices, it’s pretty much manual work there. What I want to talk about is how I use AWS CLI to sync the local directory where I store backups with a bucket folder in S3.

Obviously, you need an AWS account for everything that follows.

To prepare everything we need:

  • Create a user in IAM for running AWS CLI
  • Configure your AWS CLI
  • Create an S3 bucket and a folder for backups
  • Create a policy for the user, so it has write-only access to the bucket folder

And after the preparation we can just use another bash script.

Create a user

  • Go to the IAM console and click Add user.
  • Type in a desirable user name and check only Programmatic access. Click Next: Permissions
  • Don’t add the user to any groups, just skip the step clicking on Next: Tags
  • No tags, required, just skip clicking Next: Review
  • Review the user and submit with Create user. It’s okay to have a warning This user has no permissions, we will address it later.

Configure your AWS CLI

  • Install AWS CLI, follow the official instructions. e.g. in Debian it’s just sudo apt install awscli.
  • Go to IAM console
  • Click on your user you created
  • Go to Security credentials tab
  • Click Create access key and you’ll a dialog with Access key ID and Secret access key. Don’t close this dialog*.
  • Go to your terminal and type awscli configure
  • Follow the official instructions, you’ll need the Access key ID and Secret access key from the previous step

Now you should have a ready-to-work AWS CLI setup on your computer.

Create S3 bucket

  • Go to the S3 console and click Create bucket
  • Type in a desirable name, select a region and click Next
  • On the next Configure options step check Keep all versions of an object in the same bucket and click Next.
  • On the next Set permissions step make sure it says Block all public access and click Next.
  • After reviewing and creating the bucket browse the bucket and create a folder for backups in it

Create a policy

Now it’s time to allow our CLI user to upload new backups, for this we need to create a policy:

  • Go to the IAM console and click on the user you created earlier
  • Click Add inline policy
  • AWS has this visual editor by default but it’s easier to put this JSON example here instead of describing where to click:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListBackups",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<bucket-name>"
        },
        {
            "Sid": "UploadBackups",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::<bucket-name>/<backup-folder>/*"
        }
    ]
}

Replace <bucket-name> and <backup-folder> with your values. s3:ListBucket permission is required for the aws s3 sync command that we’re going to use, so it’s able to compare file sizes and modification dates.

Use the backup bash script

I wrote a simple bash script that finds all files with .enc extension in the current directory and its sub-directories and uploads them to the given location on S3.

You can view and copy the script from my repository on GitHub.

Now, when our S3 bucket exists and we have an IAM user for our CLI that has a policy to write to a backup folder on the bucket, we can finally run the script:

backup s3://<bucket-name>/<backup-folder>

This script is using aws s3 sync command that checks if the local file matches a remote file by size and modification date and if they match, it does not upload your local file anywhere.

Fulfilled Requirements

As you might remember, I had my very strict requirements in the beginning, how does this approach fulfill them? Let’s have a look:

  • The storage back-end must be highly available — it’s S3
  • The storage back-end must have versioning — we turned it on when created the bucket
  • The client must not have permission to destroy or corrupt data — our policy makes sure the script is allowed to only put objects, can’t read or delete them. The user is allowed to upload a file with the same name but it won’t corrupt data just will create a new version of it
  • The client must be transparent and obvious in how it works — just small bash scripts that you can read and understand
  • The client must be smart enough to synchronize backups — aws s3 sync uploads only changed or new files to the bucket
  • The encryption of my backups must happen on my machine — the first short script also fulfills this requirement for me.

I admit, this setup is not for everyone and it’s far from being simple but if you’re working with AWS already it might suit you.

I hope you learned something from this article. Take care.

P.S. Please plan your expenses using S3 on AWS in advance. I’m not responsible for any unexpected costs.


Other posts