This is the most impactful step from a performance perspective. I/O load increases significantly, due to the extra R/W operations required to commit the snapshot blocks back to the original VMDK. This eventually leads to the VM “stun” required to commit the final bits of the snapshot. The “stun” is typically a short pause usually only a few seconds or less, where the VM is unresponsive ("lost ping") while the very last bits of the snapshot file are committed.
VMware uses a “rolling snapshot” method to minimize the impact and duration of the stun as outlined below:
1.The ESX(i) host takes a second, “helper” snapshot to hold new writes.
2.The ESX(i) host reads the blocks from the original snapshot and commits them to the original VMDK file.
3.The ESX(i) host checks the size of the “helper” snapshots, if over the threshold size, repeat step 1.
4.Once all helper snapshots are determined to be under the threshold size, “stun” the VM and commit the last bits of the snapshot.
This “stun” period can be less than 1 second for small VMs with light load, or several seconds for larger VMs with significant load. To external clients this small stun simply looks like the server is “busy” and thus might delay a response for a few seconds. However, applications that are very sensitive to delays may experience issues with this short period of unresponsiveness.
Please refer to VMware Knowledge Base article 1002836 for explanation of snapshot removal issues.