Veeam Backup & Replication provides mechanisms of data compression and deduplication. Data compression and deduplication let you decrease traffic going over the network and disk space required for storing backup files and VM replicas.
Data compression decreases the size of created backups but affects duration of the backup procedure. Veeam Backup & Replication allows you to select one of the following compression levels:
- None compression level is recommended if you plan to store backup files and VM replica files on storage devices that support hardware compression and deduplication.
- Dedupe-friendly is an optimized compression level for very low CPU usage. You can select this compression level if you want to decrease the load on the backup proxy.
- Optimal is the recommended compression level. It provides the best ratio between size of the backup file and time of the backup procedure.
- High compression level provides additional 10% compression ratio over the Optimal level at the cost of about 10x higher CPU usage.
- Extreme compression provides the smallest size of the backup file but reduces the backup performance. We recommend that you run backup proxies on computers with modern multi-core CPUs (6 cores recommended) if you intend to use the extreme compression level.
Changing Data Compression Settings
You can change data compression settings for existing backup jobs. New settings will not have any effect on previously created backup files in the backup chain. They will be applied to new backup files created after the settings were changed.
Compression settings are changed on the fly. You do not need to create a new full backup to use new settings — Veeam Backup & Replication will automatically apply the new compression level to newly created backup files.
However, if you use the reverse incremental backup method, the newly created backup files will contain a mixture of data blocks compressed at different levels. For example, you have a backup job that uses the reverse incremental backup method and the Optimal level of compression. After several job sessions, you change the compression level to High. In the reverse incremental backup chains, the full backup file is rebuilt with every job session to include new data blocks. As a result, the full backup file will contain a mixture of data blocks: data blocks compressed at the Optimal level and data blocks compressed at the High level. The same behaviour applies to synthetic full backups: synthetic full backups created after the compression level change will contain a mixture of data blocks compressed at different levels.
If you want the newly created backup file to contain data blocks compressed at one level, you can create an active full backup. Veeam Backup & Replication will retrieve data for the whole VM image from the production infrastructure and compress it at the new compression level. All subsequent backup files in the backup chain will also use the new compression level.
Data deduplication decreases the size of backup files. You can enable data deduplication if you add to backup or replication jobs several VMs that have a great amount of free space on their logical disks or VMs that have similar data blocks — for example, VMs that were created from the same template. With data deduplication enabled, Veeam Backup & Replication does not store to the resulting backup file identical data blocks and space that has been pre-allocated but not used.
Veeam Backup & Replication uses Veeam Data Movers to deduplicate VM data on the source and target side.
- The source-side Veeam Data Mover deduplicates VM data at the level of VM disks. Before the source Veeam Data Mover starts processing a VM disk, it obtains digests for the previous restore point in the backup chain from the target-side Veeam Data Mover. The source-side Veeam Data Mover consolidates this information with CBT information from the hypervisor and filters VM disk data based on it. If some data block exists in the previous restore point for this VM, the source-side Veeam Data Mover does not transport this data block to the target. In addition to it, in case of thin disks the source-side Veeam Data Mover skips unallocated space.
- The target-side Veeam Data Mover deduplicates VM data at the level of the backup file. It processes data for all VM disks of all VMs in the job. The target-side Veeam Data Mover uses digests to detect identical data blocks in transported data, and stores only unique data blocks to the resulting backup file.
Depending on the type of storage you select as a backup target, Veeam Backup & Replication uses data blocks of different size to process VMs, which optimizes the size of a backup file and job performance. You can choose one of the following storage optimization options:
- The Local target (large blocks) option is recommended for backup jobs that can produce very large full backup files — larger than 16 TB. With this option selected, Veeam Backup & Replication uses data block size of 4096 KB.
If you select to use data blocks of small size to deduplicate a large backup file, the backup file will be cut into a great number of data blocks. As a result, Veeam Backup & Replication will produce a very large deduplication metadata table which can potentially overgrow memory and CPU resources of your backup repository. For backup files over 16 TB, it is recommended to choose the Local target (large blocks) option. With this option selected, Veeam Backup & Replication will use data blocks of 4 MB. Large data blocks produce a smaller metadata table that requires less memory and CPU resources to process. Note, however, that this storage optimization option will provide the lowest deduplication ratio and the largest size of incremental backup files.
If you upgrade to Veeam Backup & Replication 9.0 from the previous product version, this option will be displayed as Local target (legacy 8MB block size) in the list and will still use blocks size of 8 MB. It is recommended that you switch to an option that uses a smaller block size and create an active full backup to apply the new setting.
- The Local target option is recommended for backup to SAN, DAS or local storage. With this option selected, Veeam Backup & Replication uses data block size of 1024 KB.
The SAN identifies larger blocks of data and therefore can process large amounts of data at a time. This option provides the fastest backup job performance but reduces the deduplication ratio, because with larger data blocks it is less likely to find identical blocks.
- The LAN target option is recommended for backup to NAS and onsite backup. With this option selected, Veeam Backup & Replication uses data block size of 512 KB. This option provides a better deduplication ratio and reduces the size of a backup file because of reduced data block sizes.
- The WAN target option is recommended if you are planning to use WAN for offsite backup. With this option selected, Veeam Backup & Replication uses data block size of 256 KB. This results in the maximum deduplication ratio and the smallest size of backup files, allowing you to reduce the amount of traffic over WAN.
Changing Data Deduplication Settings
You can change data deduplication settings for existing backup jobs. New settings will not have any effect on previously created backup files in the backup chain. They will be applied to new backup files created after the settings were changed.
To apply new deduplication settings in backup jobs, you must create an active full backup after you change deduplication settings. Veeam Backup & Replication will use the new block size for the active full backup and subsequent backup files in the backup chain.
Backup Copy Jobs
To change data block size for a backup copy job, you must perform the following actions:
- Change data block size in settings of the initial backup job.
- Create an active full backup with the initial backup job.
- Create an active full backup with the backup copy job.