Backups: Difference between revisions
Strugglers (talk | contribs) (Created page with "All about BitFolk's free '''local backups service'''. (This article is a work in progress) ==Overview== BitFolk is happy to provide a 6-times-daily incremental '''[[Wikipedia:rs...") |
Strugglers (talk | contribs) No edit summary |
||
| Line 80: | Line 80: | ||
==Access to backups== | ==Access to backups== | ||
If you need to restore files from your backups you can mount them using [[Wikipedia:NFSv3|NFSv3]] over TCP. The mount points will be shown in the [https://panel.bitfolk.com/backups/ Backups] section of your [[Panel]] account. | If you need to restore files from your backups you can mount them using [[Wikipedia:NFSv3|NFSv3]] over TCP. The mount points will be shown in the [https://panel.bitfolk.com/backups/ Backups] section of your [[Panel]] account, one for each snapshot. | ||
A typical entry in <tt>/etc/fstab</tt> would look like this: | A typical entry in <tt>/etc/fstab</tt> would look like this: | ||
| Line 89: | Line 89: | ||
Your backups are only available read-only, so there is no way that anything your VPS does can corrupt them. They're also locked down to only be available from the main IPv4 address of your VPS. That also means that you can access them from the [[Rescue VM]] as that will also use your main IPv4 address. | Your backups are only available read-only, so there is no way that anything your VPS does can corrupt them. They're also locked down to only be available from the main IPv4 address of your VPS. That also means that you can access them from the [[Rescue VM]] as that will also use your main IPv4 address. | ||
==Statistics== | |||
In the [https://panel.bitfolk.com/backups/ backups section] of your [[Panel]] account you'll find some useful statistics. The most basic information includes the total on-media size of the data you have backed up, as well as the limit of what you're paying for. You'll also find a list of the paths that are being backed up. | |||
Further down you will find the differential usage and per-snapshot usage figures. | |||
===Differential usage=== | |||
This shows the volume of changed bytes between each snapshot. If no files change at all then the differential usage would be zero bytes, even though mounting the snapshot over NFS would show the full content of the files. | |||
===Per-snapshot usage=== | |||
==Limitations== | ==Limitations== | ||
Revision as of 19:23, 26 December 2016
All about BitFolk's free local backups service. (This article is a work in progress)
Overview
BitFolk is happy to provide a 6-times-daily incremental rsync-based backup service, storing your data locally (in same data centre but on different hardware to your VPS; no data transfer charge).
The service itself is free but you will need to dedicate some storage to it, which is charged at the normal rate. If you don't need the full 10GiB basic storage allocation then you can use some of that, or you can purchase extra storage for this purpose.
This is not the cheapest or most secure way to backup your data, but it may be useful in that it is fairly simple to enable and comes with built-in features like incremental data transfer, deduplication and alerting without you having to think about it.
Please note that no guarantees are made of the integrity or availability of backups made; they are provided on a reasonable effort basis.
Setup
There's a few simple things you need to set up on your VPS in order to start making use of BitFolk's backups.
Install rsync
BitFolk's backups use rsync for transferring data, so you need to make sure you have it installed.
Allow SSH access from BitFolk's backups hosts
The backups happen over SSH so please allow access to your SSH server from the following hosts:
- backup0-vip.bitfolk.com
- backup2-vip.bitfolk.com
- backup3-vip.bitfolk.com
- backup4-vip.bitfolk.com
If you have your SSH server on a non-standard port (i.e. not port 22) that is fine, just mention that when telling BitFolk about the paths to back up.
Add the rsnapshot public key
BitFolk will authenticate using a public key, so you'll need to add the rsnapshot SSH public key to your root user's .ssh/authorized_keys file.
Please note that this file is PGP-signed by key ID 2099B64CBF15490B and the only line from the file that you should use is the one that starts with 'ssh-rsa'.
This will give BitFolk's backup servers full root access to your VPS. If you'd prefer to restrict this key to only using the rsync command you can use a wrapper script, such as the one described here under "Restricting The Key".
Contact support with your list of paths
Now is the time to contact Support with a list of paths that you want to have backed up. This could be simply / (the root), although there would be a lot of things under there that don't need backing up, so you would probably prefer to list off a few top level directories instead.
Excluding data from being backed up
You can exclude things inside your selected paths by using rsync filter syntax in a file called .bitfolk-rsync-filter in the directory that contains whatever you wish to exclude.
For example, if you have asked for /var/ to be backed up, but you wish to exclude /var/log/apache/, then you would create the file /var/log/.bitfolk-rsync-filter with the following content:
- apache/
Filters only apply to the directory that the .bitfolk-rsync-filter file is in.
Once BitFolk lets you know that this is set up, backups will then take place according to the schedule you've chosen. You will not be charged for the bandwidth this uses, although it will show up on your Cacti graphs.
Schedules
The backups run every 4 hours, so that's six times per day.
Also…
- …once per day the oldest four-hourly snapshot will become a daily snapshot, and…
- …once per week the oldest daily snapshot will become a weekly snapshot, and…
- …once per month the oldest weekly snapshot will become a monthly snapshot.
The timings of the schedules are fixed, but you can choose how many iterations of each level will be kept. The default schedule keeps:
- 6 four-hourly snapshots, and
- 7 daily snapshots, and
- 4 weekly snapshots, and
- 6 monthly snapshots.
So at the most granular you'll have access to a day of four-hourly changes and at the least granular there will be versions going back 6 months. We refer to this default schedule as 6-7-4-6.
If that level of retention is not to your liking then you can pick a schedule with whatever levels of retention you like, e.g. 3-3-2-18 would be:
- Every 4 hours, retain the last 3
- Seven times per week, retain the last 3
- Four times per month, retain the last 2
- Twelve times per year, retain the last 18
Obviously the higher levels of retention will lead to more data being stored which will increase the amount of storage you need to pay for. The majority of customers making use of the backup service just stick with the default 6-7-4-6 schedule.
Incremental backups
The backups are made incrementally; only changes against the most recent four-hourly snapshot will be transferred, and only changed files with relation to the most recent four-hourly snapshot will be stored. Files that never change will only be stored once.
Access to backups
If you need to restore files from your backups you can mount them using NFSv3 over TCP. The mount points will be shown in the Backups section of your Panel account, one for each snapshot.
A typical entry in /etc/fstab would look like this:
85.119.80.241:/data/backup/rsnapshot.6-7-4-6/hourly.0/85.119.82.75/ /mnt/backups/hourly.0 nfs ro,hard,intr,noauto,nfsvers=3,tcp 0 0
Your backups are only available read-only, so there is no way that anything your VPS does can corrupt them. They're also locked down to only be available from the main IPv4 address of your VPS. That also means that you can access them from the Rescue VM as that will also use your main IPv4 address.
Statistics
In the backups section of your Panel account you'll find some useful statistics. The most basic information includes the total on-media size of the data you have backed up, as well as the limit of what you're paying for. You'll also find a list of the paths that are being backed up.
Further down you will find the differential usage and per-snapshot usage figures.
Differential usage
This shows the volume of changed bytes between each snapshot. If no files change at all then the differential usage would be zero bytes, even though mounting the snapshot over NFS would show the full content of the files.
Per-snapshot usage
Limitations
There are always trade-offs when deciding on a backup strategy and the solution offered here by BitFolk may not be the most suitable for you. It is important that you realise what its limitations are and make a decision for yourself. With that in mind, here's a bit more detail about some of the limitations. If it's not going to work for you then maybe one of the alternatives would be more suitable.
Backups are stored locally
All of BitFolk's servers are in the same facility. Although your backup data is physically separate from your VPS's data, if there is a major outage in that locality then you may not have access to your data for some time. A fire, bomb or some other serious event could physically destroy both your VPS's storage and the storage that your backups are on.
Backups are not compressed
Backed up data is stored bit-for-bit the same as it was read. As it is infrequently-accessed it could benefit from compression. Perhaps the btrfs or zfs filesystems could be used in future to provide compression.
If any level of the schedule has zero retention, all lower levels must also have zero retention
The schedules are tiered in that the oldest four-hourly snapshot becomes a daily, the oldest daily becomes a weekly, and so on. This means that if you don't actually have any retention at a given level then you can't have any retention at levels below it. So, 6-0-0-0 (keep just six four-hourly snapshots) is fine, but 6-0-4-0 is not valid. You'd at the very least need to do it as 6-1-4-0 (keep six four-hourly snapshots, one daily snapshot and four weekly snapshots).
Changes are stored on a per-file basis not a per-block basis
If a file changes then both copies of the file will be stored in their entirety. More sophisticated backup solutions would only store changed blocks between the two files.
Files are only compared against the exact same file path in the most recent snapshot
Files that alternate between two states will keep being stored in their entirety as the only deduplication being done is against the most recent copy. Also moving a file to a new name (a common thing to do with log file rotation) will result in both the old and new copies being stored. This can result in a large amount of storage being used to keep renamed copies of things there is already a backup of. Consider excluding some/all rotated logs.
Any metadata change will cause a new copy to be stored
Deduplication in this system is provided by hard-linking the new file path to the old file path where the files are identical. As hardlinks all have identical metadata (owner, group, permissions, etc.), any change of metadata will force a new file to be stored.
Backups are only available over IPv4
At the moment the backups are only available for NFS mounting over IPv4 and backups are only taken over IPv4. This shouldn't matter too much because at the moment every BitFolk VPS comes with one free IPv4 address, so it is only an idealogical issue for those who wish to make every service available over IPv6 (or who wish to run without IPv4).
At some point BitFolk will have to solve this as it is theoretically possible that a VPS will come with no IPv4 addresses as standard.
Backups are not stored encrypted
You will need to trust BitFolk staff not to access your backed up data. Also, given that this is a "pull" setup, should BitFolk's backup servers be compromised the attacker would have unrestricted root access to your VPS. You can restrict the SSH public key in use to only have access to the rsync command but that would still allow arbitrary access to your data.
Alternative backup strategies
If the limitations of this service are too great, or if it just doesn't work how you'd like it to work, you should find some other way to do your backups. Here's some suggestions. Please feel free to add more.
Do it yourself on S3
If you use Amazon's S3 as a back end and create the backup logic yourself you can end up with a very flexible and very cheap solution.
Many backup solutions like Duplicity can use S3 as a back end and also store the backups encrypted.
rsync.net
Tarsnap
Tarsnap is encrypted, deduplicated and compressed. It uses S3 as a back end.