What are the differences between full, differential, and incremental backups? What are the pros and cons of each (or: why don’t we always just use full backups)?
In a separate article, I’ve explained the backup basics: “Backups 101 – explained”
Table Of Contents (T.O.C.):
- Task – backup my stuff
- Data changes & backup frequency
- Backup types
3.1. Full backups
3.2. Differential backups
3.3. Incremental backups
- The costs of backups
- Combining backup types
- Realistic scenarios
- Incremental-forever backups
- Synthetic full backups
1. Task – backup my stuff
OK, here’s a task I set for myself: keep my data backed up, with minimal waste of storage and resources. How do I do that?
First, let’s see what exactly it is that I wish to do (this is often a good question to ask yourself before doing something):
- How much data do I have to backup?
About 5 TB.
- How often do I need to back it up?
Well, it depends on the particular data.
The second point leads us to the next section:
2. Data changes & backup frequency
How often should I update my data? Let’s see what kind of data I have first:
- YouTube videos.
I just store those. Adding new ones as I make them. Let’s call this:
- Holiday pictures.
I don’t go back in time, so the pictures from my last, and previous holidays remain the same. I just need to store those.
However, I might want to delete some crappy ones when I find the time to curate them. Let’s call this:
- Website articles.
I often edit and update my website articles. Let’s call this kind of changes:
I don’t like doing the same work twice. That’s why I like backing up my YouTube videos right after I’ve made them. Similar goes for any holiday photos. When it comes to my website articles, I can accept losing an hour of work, but no more than that if hourly backups can be made in any convenient way.
Backup frequency formula is very simple:
How much data/work are you prepared to lose?
It is clear that I have three distinct types of data, and data updates/changes here. Would it make sense to use the same backup method for each? Maybe. Maybe not. Let’s dive into that.
3. Backup types
Before I explain it in more detail, here’s a brief overview of the three basic backup types – full, differential, and incremental backups:
Now let’s see about the pros and cons of each:
3.1. Full backups
The simplest type of backup – I just backup all my files (videos, photos, database exports etc.) to my backup storage.
If I need to restore backups, I can just download the full backup to restore from it.
But what if I create just one more video. Would it make sense to upload the full backup again? Well… that leads us to the next section.
3.2. Differential backups
Differential backups contain only the changes from the last full backup.
Say: I’ve made a full backup, and after that:
- I create one more new video.
- I remove two crappy vacation photos.
- I edit one website article.
Then, I could create a new full backup and upload my 5 TB of data again. However, for such small changes, it would make sense to create a differential backup, which would contain:
- My new video.
- Info on the two vacation photos that need to be removed.
- The updated website database version.
The differential backup is created “upon” the last full backup.
This saves me a lot of bandwidth (for remote backup storage uploads) and time.
To restore data from a differential backup, I must also have the full backup it was created “upon” (to use that term). It can not “function” independently.
Now, what happens when I create some more website article updates? I could use a differential backup, but that would mean uploading the new video, again, along with the info on the other changes since the last full backup. Is there a more “compact” way of doing this? Yes there is, as I’ll explain in the next section.
3.3. Incremental backups
While a differential backup stores the changes since the last full backup, an incremental backup stores the changes since the last backup, regardless of whether it’s a full, or a differential, or another incremental backup.
In other words, I could make an incremental backup every hour, and it would only store the changes since the previous incremental backup (of course, the first incremental backup I make would store the changes since its previous full or differential backup).
While the incremental backups are the “lightest” to create and upload, they are a hassle when you need to restore the data. In order to restore the data from incremental backups, you need:
- the original full backup
- the last differential backup (if you created the incremental backups upon the last differential backup, i.e. if you made any differential backups)
- all the incremental backups since your last full backup or the last differential backup (if you built your incremental backups upon the differential backup)
If this sounds confusing, don’t worry, just read on.
4. The costs of backups
Each type of backup comes with its pros and cons, with a certain cost so to speak. Let’s discuss those:
- Backup speed
– Incremental backups are the fastest to create, as they only record changes since the previous backup (even if the previous backup was another incremental backup).
– Full backups are the slowest to create.
– Differential backups sit somewhere in between the other two.
- Data restoration speed
Here, the situation is reversed, and full backups are the simplest and fastest to restore from, while the incremental backus are the slowest (and differential backups are in between the two).
- Storage space use
Incremental backups take the least amount of space, while full backups take a lot more space (with differential backups in between these two extremes).
A rather important thing when it comes to backups. Full backups are self-sufficient and the most reliable. Incremental backups are the most risky, as any data corruption in any incremental backup might prevent data restauration (and you need the original full backup and all the following incremental backups to restore your data).
5. Combining backup types
This is a realistic practical example that may help to understand the backup type differences and uses, and it could be a good solution to my “backup problem”:
- I make a full backup at the start of each month.
- I run daily incremental backups.
- And weekly differential backups.
- So, after a full backup, I’d just upload and store incremental backups until the first Sunday.
– At this point, to restore my data, I need the full backup, and all the incremental backups I’ve created.
- Then, on the first Sunday of the month, I’d make and upload the 1st differential weekly backup.
– At this point, to restore my data, I need the full backup and the now-created (1st) differential backup.
- After that, I’d start uploading daily incremental backups built upon that 1st differential backup.
– At this point, to restore my data, I need the full backup, the 1st differential backup, and all the incremental backups created since the 1st differential backup.
- On the second Sunday of the month, I’d make the 2nd differential backup (built upon the last full backup, not the 1st differential backup).
– At this point, to restore my data, I need the full backup, and the 2nd differential backup. The 2nd differential backup contains all the changes since the full backup, so the 1st differential backup and the incremental backups made before the 2nd differential backup are needless.
- Followed by daily incremental backups built upon that 2nd differential backup.
– At this point, to restore my data, I need the full backup, the 2nd differential backup, and all the incremental backups I’ve created since the 2nd differential backup.
This continues until the start of a new month, when I create a new full backup, and build any following differential and incremental backups upon that, new full backup.
This kind of backup policy strikes a fine ballance between saving storage space, bandwidth and other resources, the ease of restore, and the ability to create frequent up-to-date backups.
Let us now discuss some extremes, to see the alternative options:
I could run full backups daily. Depending on how much data I have, this could require a lot of resources. Especially if I wish to keep versions for the last 30 or more days (in case I’ve made a mistake two weeks ago, and wish a backup copy just before that mistake to revert it). Storing my full backup with 30 daily coppies that just slightly differ from one another is not very practical.
- In this scenario, if I have about 5 TB of data, my 30-day backup history would take 30 x 5 TB = 150 TB of storage.
Likewise, if I make a lot of changes, I could not rely on one full backup and daily differential backups, because each differential backup contains all the changes since the last full backup.
- In this scenario, if I make 1 GB of changes daily, my 30 differential backups would contain 1, 2, 3, 4, 5… 30 GB of data respectivelly = 465 GB in total (each differential backup being 1 GB larger than the previous one, because each differential backup stores all the changes since the last full backup).
If I relied on daily incremental backups only, the amount of storage needed would be the smallest (just 1 GB of data each day, for each incremental backup). However, to restore any data, I would need all the created incremental backups and the starting full backup. That can be complicated and require a lot of time to restore.
That’s where the differential backups come in play. When used right, they can be a good step between the last full backup, and a series of incremental backups.
If I’ve explained all this remotely-well, you can now understand why:
- Incremental backups are also called “differential incremental backups.”
- Differential backups are also called “cumulative incremental backups.”
Should I configure a set of monthly-full, weekly-differential and daily-incremental backups for my data and call it a day? Read on. 🙂
6. Realistic scenarios
Computers and storage systems have come a long way and today they are pretty fast and powerful. That is why differential backups don’t make nearly as much sense today as they did in the times of backuping on tapes (though some use cases still warrant the use of tape backups).
For many use cases and computer systems, you can skip differential backups and just go with a combination of full, and incremental backups. Recovery could work almost as fast as when restoring full (or full & differential) backups, without the storage overhead of differential (or numerous full) backups.
7. Incremental-forever backups
With any of the above-explained types of backup, you start with a full backup, and then run differential or incremental (or both) backups “on top of it” (forming a sort of a backup chain).
With the speed of modern systems, you could just keep running incremental backups, and never create a differential, or a new full backup to incorporate all the subsequent data changes and updates.
However, this runs the risk of having say 50th day incremental backup be corrupt (for whatever reason). In that case, on a 100th day, you could face an unpleasant surprise of realizing you can only recover the system’s state on the 49th backup day.
Fortunately, modern systems have another ace up their sleeve:
8. Synthetic full backups
Now, imagine if your backup software were “smart” enough to incorporate incremental backup changes into your original full backup. You could run incremental backups only, indefinitely, but without the problems that come with restoring data from too many incremental backups in a chain.
That is what is called synthetic full backup, and it can keep running with an incremental-forever backup system, without most drawbacks and risks of the incremental-forever backup system.
A synthetic full backup practically acts as a normal full backup in terms of being a separate, self-sustained logical entity that can be cloned and coppied to different locations (so you can make coppies of your backups), but its creation (and updating) does not require all the files to be copied – only the changes are copied and incorporated.
Basically, the backup software “adds” incremental backups to the last synthetic backup, but only the incremental backups created since the last synthetic backup was created.
In practice, synthetic full backups can be configured to run on a weekly, or even daily basis, incorporating all the incremental backups created over the last 24 hours, without the extra overhead of creating and uploading all the data every day (as would be the case if using the “ordinary” full backups).
A downside of synthetic full backups is they require fast storage with random seek and fast read and write, with a powerful CPU to boot.
Modern backup software like Macrium Reflect (link to their website), combined with a decent-quality NAS, should be able to handle this – with automated uploads to a “cloud” storage, or another NAS at a different physical location.
That is the solution I am leaning towards, with the idea that automated up-to-date backups are important, that several backup coppies are a necessity, but that storage space should not be wasted.
Relja OneIsNone Novović