We continue the discussion on how data deduplication has revolutionized how PC backup software works. This article is the second in a series that attempts to explain how various factors define the success of a PC backup. We have already explained Target Vs Source-based Deduplication.
Inline Vs Post-process Deduplication
In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e. as and when its send to target) or after its been stored in the target storage. The former is called inline deduplication.
The obvious advantages are:
– Increase in overall efficiency as data is only passed and processed once
– The processed data is instantaneously available for post storage processes like recovery and replication reducing the RPO and RTO window.
The disadvantages are:
– Decrease in write throughput
– Extent of deduplication is less – Only fixed-length block deduplication approach can be use
– The inline deduplication only processed incoming raw blocks and does not have any knowledge of the files or file-structure. This forces it to use the fixed-length block approach (discussed in details later).
– The post-process deduplication asynchronously acts on the stored data. And has an exact opposite effect on advantages and disadvantages of the inline deduplication listed above.
File vs Sub-file Level Deduplication
The duplicate removal algorithm can be applied on full file or sub-file levels. Full file level duplicates can be easily eliminated by calculating single checksum of the complete file data and comparing it against existing checksums of already backed up files. It’s simple and fast, but the extent of deduplication is very less, as it does not address the problem of duplicate content found inside different files or data-sets (e.g. emails).
The sub-file level deduplication technique breaks the file into smaller fixed or variable size blocks, and then uses standard hash based algorithm to find similar blocks.
Among the better products PC backup software that use both post-process and sub-file level data deduplication is Druva inSync. Get a free trial today at http://www.druva.com/insync/pc-backup
Enjoy Reading:- PC Backup Software and Data Deduplication (Part I)