Fix HA VM Migration Race Condition #490
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix HA VM Migration Race Condition
This pull request resolves a race condition that occurs when migrating High Availability (HA) virtual machines.
The Problem
When a Terraform plan modifies the
target_nodeof aproxmox_vm_qemuresource with HA enabled, the provider initiates a migration. However, it would then immediately attempt to apply further configuration updates to the VM on the new node.Due to cluster synchronization delays, the VM's configuration file might not be immediately available on the destination node, or the VM might still be locked by the migration process. This resulted in intermittent errors, such as:
500 Configuration file 'nodes/...' does not exist500 VM is locked (migrate)This pull request addresses issue #1343.
The Solution
To ensure the provider waits until the migration is fully complete, this change introduces a robust polling mechanism. After initiating a migration, the provider will now:
lock: migrate) is released from the VM's status.This ensures that the provider only proceeds with subsequent configuration updates after the Proxmox cluster has fully finalized the migration and the VM is ready for new commands. A generous 10-minute timeout has been implemented to accommodate large or slow migrations.