My Automated Backups to AWS S3 With Client-Side AES256

author picture
Author:  Denis Rechkunov
 /  ~1400 words  / 7 min read

I'm very paranoid when it comes to my data. For me, the only backup solution that I could possibly use, must be simple and must have client-side AES256 encryption with the passphrase that only I would know.

Image credit: The bank vault in NoMad Downtown LA. Image by Benoit Linero.

What I was looking for. Requirements #

The reason I didn’t just start using some backup software with built-in encryption is a pure lack of trust. So, if you’re fine with someone managing your backups, you might think the approach described here is a total overkill. And it’s okay. But there are some people out there who have the same trust issues that I do and they might find this article useful, so I write it for them. And also for myself, as a write-up what I did.

What are my personal requirements to a backup automation:

  • The storage back-end must be highly available
  • The storage back-end must have versioning
  • The client must not have permission to destroy or corrupt data
  • The client must be transparent and obvious in how it works
  • The client must be smart enough to synchronize backups — uploads changes only
  • The encryption of my backups must happen on my machine and should prompt for a passphrase during the process. It should be technically impossible for someone except me to access the backups.

I know, one can argue about the last one because this passphrase is a shared secret created by a human but I can assure you, nobody else knows my passphrase (unless somebody installs a key-logger software on my computer) and the passphrase is really strong (long, special characters, different casing, numbers, etc).

3-2-1 Backup Rule #

There is a common rule of backup that says you must be:

  • Making at least 3 copies of data, located on physically different storage media;
  • Keeping these copies in no less than 2 different formats;
  • Always storing at least 1 of these backups off-site (e.g. on the commercial cloud account).

I don’t know where exactly this rule came from but you can definitely find some mentions on the internet (e.g here and here ).

This rule makes sense to me, so I would like to follow this rule as much as I can.

So, for me it would be:

  • 3 copies of data: 1 on AWS, 1 on a mSD card, 1 on a USB thumb drive
  • 2 formats: I guess this is an exception for me, I store only AES256 encrypted TAR files
  • 1 off-site: 1 backup is on AWS, so technically it’s off-site of my laptop. Or I can say that my thumb drive or mSD card is off-site, whatever.

Before we talk about how I actually store backups, let’s talk about the encryption I use.

First Step: Encryption #

Some time ago I wrote a bash script for encrypting a given directory. It’s very simple and is based on the GPG symmetric mode and AES256 algorithm. This script creates a TAR archive and then directs it to GPG that prompts for a passphrase and performs the encryption.

You can find the source code here on GitHub .

This script produces *.enc files that I can upload or store anywhere without fear that somebody can get access to the content.

For me, it’s the simplest and most reliable solution: GPG is a very reliable tool and I understand 100% what my script is doing because it’s so simple.

Second Step: Storage in S3 #

We’re not going to talk about my physical devices, it’s pretty much manual work there. What I want to talk about is how I use AWS CLI to sync the local directory where I store backups with a bucket folder in S3.

Obviously, you need an AWS account for everything that follows.

To prepare everything we need:

  • Create a user in IAM for running AWS CLI
  • Configure your AWS CLI
  • Create an S3 bucket and a folder for backups
  • Create a policy for the user, so it has write-only access to the bucket folder

And after the preparation we can just use another bash script.

Create a user #

  • Go to the IAM console and click Add user.
  • Type in a desirable user name and check only Programmatic access. Click Next: Permissions
  • Don’t add the user to any groups, just skip the step clicking on Next: Tags
  • No tags, required, just skip clicking Next: Review
  • Review the user and submit with Create user. It’s okay to have a warning This user has no permissions, we will address it later.

Configure your AWS CLI #

  • Install AWS CLI, follow the official instructions . e.g. in Debian it’s just sudo apt install awscli.
  • Go to IAM console
  • Click on your user you created
  • Go to Security credentials tab
  • Click Create access key and you’ll a dialog with Access key ID and Secret access key. Don’t close this dialog*.
  • Go to your terminal and type awscli configure
  • Follow the official instructions , you’ll need the Access key ID and Secret access key from the previous step

Now you should have a ready-to-work AWS CLI setup on your computer.

Create S3 bucket #

  • Go to the S3 console and click Create bucket
  • Type in a desirable name, select a region and click Next
  • On the next Configure options step check Keep all versions of an object in the same bucket and click Next.
  • On the next Set permissions step make sure it says Block all public access and click Next.
  • After reviewing and creating the bucket browse the bucket and create a folder for backups in it

Create a policy #

Now it’s time to allow our CLI user to upload new backups, for this we need to create a policy:

  • Go to the IAM console and click on the user you created earlier
  • Click Add inline policy
  • AWS has this visual editor by default but it’s easier to put this JSON example here instead of describing where to click:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListBackups",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<bucket-name>"
        },
        {
            "Sid": "UploadBackups",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::<bucket-name>/<backup-folder>/*"
        }
    ]
}

Replace <bucket-name> and <backup-folder> with your values. s3:ListBucket permission is required for the aws s3 sync command that we’re going to use, so it’s able to compare file sizes and modification dates.

Use the backup bash script #

I wrote a simple bash script that finds all files with .enc extension in the current directory and its sub-directories and uploads them to the given location on S3.

You can view and copy the script from my repository on GitHub .

Now, when our S3 bucket exists and we have an IAM user for our CLI that has a policy to write to a backup folder on the bucket, we can finally run the script:

backup s3://<bucket-name>/<backup-folder>

This script is using aws s3 sync command that checks if the local file matches a remote file by size and modification date and if they match, it does not upload your local file anywhere.

Fulfilled Requirements #

As you might remember, I had my very strict requirements in the beginning, how does this approach fulfill them? Let’s have a look:

  • The storage back-end must be highly available — it’s S3
  • The storage back-end must have versioning — we turned it on when created the bucket
  • The client must not have permission to destroy or corrupt data — our policy makes sure the script is allowed to only put objects, can’t read or delete them. The user is allowed to upload a file with the same name but it won’t corrupt data just will create a new version of it
  • The client must be transparent and obvious in how it works — just small bash scripts that you can read and understand
  • The client must be smart enough to synchronize backups — aws s3 sync uploads only changed or new files to the bucket
  • The encryption of my backups must happen on my machine — the first short script also fulfills this requirement for me.

I admit, this setup is not for everyone and it’s far from being simple but if you’re working with AWS already it might suit you.

I hope you learned something from this article. Take care.

P.S. Please plan your expenses using S3 on AWS in advance. I’m not responsible for any unexpected costs.