During the site partition three things were contemplated to happen:
- VMs that were registered by APD should be killed by component protection
- If HA component protection does not work, vSphere should kill the VMs when the partition is ameliorate
- VMs should be reestablished by vSphere HA
The problems faced were two fold, VMs were reestablished by vSphere HA, however:
- vSphere HA component protection did not kill the VMs
- When the partition was lifted vSphere did not kill the VMs which had lost the lock to the data store either.
It took a while before they found out what was going on, at least for one of the problems. Lets start with the second problem first, why are not the VMs killed when the partition is lifted? VSphere should do this accordingly. Well vSphere does this accordingly but only when there is a guest operating system installed and an I/O is announced as soon as an I/O is announced by the VM then vSphere will observance the lock to the disk is lost and accomplished by another host and kill the VM. If one has an empty VM then this won’t happen as there will not be any I/O to the disk.
Now back to the first problem. The fact the vSphere HA component protection does not anathematize in is still being debated, but one can think there is a specific reason for it. VSphere HA component protection is a feature that kills VMs on a host so they can be begins over when an APD or a PDL scheme has occurred. However, it will only do this when it is:
- Certain the VM can be resumed on the other side
- There are healthy hosts in the other partition.
First one is clear but what does the second one mean? Well basically there are three options:
- Possibility of healthy host: Yes >> Terminate
- Possibility of Healthy host: No >> Don’t Terminate
- Possibility of healthy host: Unknown >> Terminate
So in that case one can have VMCP set to “ Aggressively “ failover VMs it will only do so when it knows hosts are accessible in the other site or when it does not know the state of the hosts in the other site. If for whatever reason the hosts are deemed as unhealthy the answer to the question if there are healthy hosts available or not will be “No” and as such the VMs will not be killed by VMCP. The question remains why these hosts are reported as unhealthy in this partition scheme, which is something they are now trying to find out. Probably it could be caused by misconfigured Heartbeat Data stores but this is still something to be configured.
Heartbeat data stores need to be accessible on both sites for vSphere HA to identify this scheme correctly. If there are no heartbeat data stores present on both sites then it could happen that no hosts are marked as healthy, which means that VMCP will not instantly kill those VMs when the APD has occurred.
For more information about vSphere you can go for VMware training in Delhi