This is an archive version of the document. To get the most up-to-date information, see the current version.

Global Data Deduplication

In this article

    The goal of WAN acceleration is to send less data over the network. To reduce the amount of data going over WAN, Veeam Backup & Replication uses the global data deduplication mechanism.

    1. When you first run a remote job, Veeam Backup & Replication analyzes data blocks going over WAN.
    2. With every new cycle of a remote job, Veeam Backup & Replication uses the data redundancy algorithm to find duplicate data blocks in copied files. Veeam Backup & Replication analyzes data blocks in files on the source side and compares them with those that have been previously transferred over WAN. If an identical data block is found, Veeam Backup & Replication deduplicates it.

    Veeam Backup & Replication uses three sources for data deduplication:

    • VM disks. Veeam Backup & Replication analyses data blocks within the same VM disk. If identical blocks are found, duplicates are eliminated.
      For example, in case of a virtualized Microsoft Exchange server, the same email is typically stored in sender’s Outbox folder of the sender and recipient’s Inbox folder, which results in duplicate data blocks. When a remote job runs, Veeam Backup & Replication detects such VM data blocks and performs deduplication.
    • Previous restore points for the processed VM on the target repository. Veeam Backup & Replication analyses data in the restore point that is about to be copied and the restore point(s) that are already stored on the target side. If an identical block is found on the target side, Veeam Backup & Replication eliminates the redundant data block in the copied restore point.
    • Global cache. Veeam Backup & Replication creates a global cache holding data blocks that repeatedly go over WAN. In a new job session, Veeam Backup & Replication analyzes data blocks to be sent and compares them with data blocks stored in the global cache. If an identical data block is already available in the global cache, its duplicate on the source side is eliminated and not sent over WAN.

    As a result, only unique data blocks go over WAN. Data blocks that have already been sent are not sent. This way, Veeam Backup & Replication eliminates transfer of redundant data over WAN.

    Global Data Deduplication Note:

    Veeam Backup & Replication deduplicates data blocks within one VM disk and in restore points for one VM only. Deduplication between VM disks and restore points of different VMs is performed indirectly, via the global cache. For more information, see WAN Global Cache.