My automated backups to AWS S3 with client-side AES256
What I was looking for. Requirements
The reason I didn’t just start using some backup software with built-in encryption is a pure lack of trust. So, if you’re fine with someone managing your backups, you might think the approach described here is a total overkill. And it’s okay. But there are some people out there who have the same trust issues that I do and they might find this article useful, so I write it for them. And also for myself, as a write-up what I did.
What are my personal requirements to a backup automation:
- The storage back-end must be highly available
- The storage back-end must have versioning
- The client must not have permission to destroy or corrupt data
- The client must be transparent and obvious in how it works
- The client must be smart enough to synchronize backups — uploads changes only
- The encryption of my backups must happen on my machine and should prompt for a passphrase during the process. It should be technically impossible for someone except me to access the backups.
I know, one can argue about the last one because this passphrase is a shared secret created by a human but I can assure you, nobody else knows my passphrase (unless somebody installs a key-logger software on my computer) and the passphrase is really strong (long, special characters, different casing, numbers, etc).
3-2-1 Backup Rule
There is a common rule of backup that says you must be:
- Making at least 3 copies of data, located on physically different storage media;
- Keeping these copies in no less than 2 different formats;
- Always storing at least 1 of these backups off-site (e.g. on the commercial cloud account).
I don’t know where exactly this rule came from but you can definitely find some mentions on the internet (e.g here and here).
This rule makes sense to me, so I would like to follow this rule as much as I can.
So, for me it would be:
- 3 copies of data: 1 on AWS, 1 on a mSD card, 1 on a USB thumb drive
- 2 formats: I guess this is an exception for me, I store only AES256 encrypted TAR files
- 1 off-site: 1 backup is on AWS, so technically it’s off-site of my laptop. Or I can say that my thumb drive or mSD card is off-site, whatever.
Before we talk about how I actually store backups, let’s talk about the encryption I use.
First Step: Encryption
Some time ago I wrote a bash script for encrypting a given directory. It’s very simple and is based on the GPG symmetric mode and AES256 algorithm. This script creates a TAR archive and then directs it to GPG that prompts for a passphrase and performs the encryption.
You can find the source code here on GitHub.
This script produces *.enc
files that I can upload or store anywhere without fear that somebody can get access to the content.
For me, it’s the simplest and most reliable solution: GPG is a very reliable tool and I understand 100% what my script is doing because it’s so simple.
Second Step: Storage in S3
We’re not going to talk about my physical devices, it’s pretty much manual work there. What I want to talk about is how I use AWS CLI to sync the local directory where I store backups with a bucket folder in S3.
Obviously, you need an AWS account for everything that follows.
To prepare everything we need:
- Create a user in IAM for running AWS CLI
- Configure your AWS CLI
- Create an S3 bucket and a folder for backups
- Create a policy for the user, so it has write-only access to the bucket folder
And after the preparation we can just use another bash script.
Create a user
- Go to the IAM console and click
Add user
. - Type in a desirable user name and check only
Programmatic access
. ClickNext: Permissions
- Don’t add the user to any groups, just skip the step clicking on
Next: Tags
- No tags, required, just skip clicking
Next: Review
- Review the user and submit with
Create user
. It’s okay to have a warningThis user has no permissions
, we will address it later.
Configure your AWS CLI
- Install AWS CLI, follow the official instructions. e.g. in Debian it’s just
sudo apt install awscli
. - Go to IAM console
- Click on your user you created
- Go to
Security credentials
tab - Click
Create access key
and you’ll a dialog withAccess key ID
andSecret access key
. Don’t close this dialog*. - Go to your terminal and type
awscli configure
- Follow the official instructions, you’ll need the
Access key ID
andSecret access key
from the previous step
Now you should have a ready-to-work AWS CLI setup on your computer.
Create S3 bucket
- Go to the S3 console and click
Create bucket
- Type in a desirable name, select a region and click
Next
- On the next
Configure options
step checkKeep all versions of an object in the same bucket
and clickNext
. - On the next
Set permissions
step make sure it saysBlock all public access
and clickNext
. - After reviewing and creating the bucket browse the bucket and create a folder for backups in it
Create a policy
Now it’s time to allow our CLI user to upload new backups, for this we need to create a policy:
- Go to the IAM console and click on the user you created earlier
- Click
Add inline policy
- AWS has this visual editor by default but it’s easier to put this JSON example here instead of describing where to click:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListBackups",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket-name>"
},
{
"Sid": "UploadBackups",
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<bucket-name>/<backup-folder>/*"
}
]
}
Replace <bucket-name>
and <backup-folder>
with your values. s3:ListBucket
permission is required for the aws s3 sync
command that we’re going to use, so it’s able to compare file sizes and modification dates.
Use the backup bash script
I wrote a simple bash script that finds all files with .enc
extension in the current directory and its sub-directories and uploads them to the given location on S3.
You can view and copy the script from my repository on GitHub.
Now, when our S3 bucket exists and we have an IAM user for our CLI that has a policy to write to a backup folder on the bucket, we can finally run the script:
backup s3://<bucket-name>/<backup-folder>
This script is using aws s3 sync
command that checks if the local file matches a remote file by size and modification date and if they match, it does not upload your local file anywhere.
Fulfilled Requirements
As you might remember, I had my very strict requirements in the beginning, how does this approach fulfill them? Let’s have a look:
- The storage back-end must be highly available — it’s S3
- The storage back-end must have versioning — we turned it on when created the bucket
- The client must not have permission to destroy or corrupt data — our policy makes sure the script is allowed to only put objects, can’t read or delete them. The user is allowed to upload a file with the same name but it won’t corrupt data just will create a new version of it
- The client must be transparent and obvious in how it works — just small bash scripts that you can read and understand
- The client must be smart enough to synchronize backups —
aws s3 sync
uploads only changed or new files to the bucket - The encryption of my backups must happen on my machine — the first short script also fulfills this requirement for me.
I admit, this setup is not for everyone and it’s far from being simple but if you’re working with AWS already it might suit you.
I hope you learned something from this article. Take care.
P.S. Please plan your expenses using S3 on AWS in advance. I’m not responsible for any unexpected costs.
Other posts
My road to IRONMAN 70.3
This is a story of a big life change.
Why run?
Recently quite a few people asked me "Why are you running?", sometimes even followed by something like "it's stupid" or whatever to hint me that I should not. I hope to be able to just send a link to this article next time.
Paranoid Habits. Privacy Tips
This is another write up of my recent talk where I share some information about the modern era of data collection and how you can keep your privacy at least partially.