Hey everyone. I'm wondering if someone from the Pu...
# automation-api
t
Hey everyone. I'm wondering if someone from the Pulumi team can help us discover the right pattern for accomplishing something in our application. We use the automation API along with some dyanamic providers written against our virtualization API to deliver virtual machines and disk images. With regard to the virtual machine specifically, we have some problems around provider
delete
methods. I'll describe a concrete instance. The way we chose to break out our VirtualMachine into a component resource with several sub resources. The state in the remote API looks roughly as follows:
Copy code
- VirtualMachine
    - VirDomain: Controls properties of the base VM like CPU and memory
    - BlockDevices
    - NetworkDevices
    - BootOrder
    - RunningState
So the VirtualMachine is a component resource and each of the resources in that component are of type
pulumi.Resource
. When we're creating or updating things, everything is fine. Each of those resources depends on the resource before it using
depends_on
resource option. However, when delete happens, the constraint that the deleted resource placed upon the execution of subsequent resources disappears. So if we delete a block devices, RunningState and BootOrder will execute the usual
configure
,
check
,
diff
and ultimately
update
if necessary, right away. That's a problem because our
delete
has a side effect. It needs to invoke the delete methods on the remote API and wait for the task to complete. That task is SHUTTING DOWN THE VM. So really we need to: Shut down the VM -> delete the block device -> place the VM into the expected state again (usually 'RUNNING'). The shutdown part is handled in the BlockDeviceProvider's
delete
implementation right now, but nothing waits for anything in a
delete
method. It seems like the implementation of the providers (at least the dynamic providers) means that delete operations are fire and forget. So if
RunningState
(which reconciles the intended power state of the VM) was depending on a block device that was deleted, it will just run since the deleted resource no longer constrains its behavior. What's the right pattern for something like this? Right now, we're considering: • Use a
before_update
hook on the VirtualMachine component resource to shut down the VM pre-emptively if we're doing deletes that require the VM to be off. • Perform the delete • Implement RunningState such that it polls until the VM is in its intended state before attempting to place the VM in the intended power state (for this concrete example, turning the VM back on). It strikes me that this is probably an issue that people run into everywhere, and I don't find any documented patterns for how to address this particular hiccup we have.
I started a Discussion on Github too https://github.com/pulumi/pulumi/discussions/20500
f
Hey @tall-father-27808! I read though this and the discussion a bit... I'm not aware of any pattern or best practice that exactly fits this "controller resource" concept. Would you mind sharing where that was suggested? It does feel similar to what some resources do e.g. aws.acm.CertificateValidation which waits for ACM certificate validation to occur.
t
It wasn't so much suggested, I guess, as we came up with it in our own way. I'll give you an example implementation.
Copy code
class BlockDevices(pulumi.ComponentResource):
    def __init__(
        self,
        name: str,
        inputs: BlockDevicesInputs,
        opts: pulumi.ResourceOptions | None = None,
    ):
        super().__init__(
            "kraken:worker:VirtualMachineBlockDevices",
            name,
            {
                "block_devices": [sd.model_dump(mode="json") for sd in inputs.storage_devices],
                "vsd_map": inputs.vsd_map,
                "vm_uuid": inputs.vm_uuid,
            },
            opts,
        )
        bootable_devices: dict[int, pulumi.Output[str]] = {}
        block_devices: dict[str, BlockDevice] = {}
        for storage_device in inputs.storage_devices:
            bdev = BlockDevice(
                name=f"{name}-{storage_device.name}",
                args=BlockDeviceInputs(
                    spec=storage_device,
                    vm_uuid=inputs.vm_uuid,
                    vsd_uuid=inputs.vsd_map.get(storage_device.source) if storage_device.source else None,
                ),
                opts=pulumi.ResourceOptions(parent=self, depends_on=[*block_devices.values()]),
            )
            block_devices[storage_device.name] = bdev
            if storage_device.boot:
                bootable_devices[storage_device.boot] = bdev.id

        controller = BlockDeviceController(
            name=f"{name}-bdev-controller",
            args=BlockDeviceControllerInputs(
                vm_uuid=inputs.vm_uuid, block_device_uuids=[bdev.id for bdev in block_devices.values()]
            ),
            opts=pulumi.ResourceOptions(parent=self, depends_on=[*block_devices.values()]),
        )

        self.id = inputs.vm_uuid
        self.block_devices = block_devices
        self.block_device_controller = controller
        self.bootable_devices = bootable_devices
In the above example @future-hairdresser-70637 we utilize the controller as a way to perform deletes (which have asynchronous network dependencies) by delegating to an update lifecycle method in the controller, which causes downstream resources to wait for it. This doesn't seem possible with use of
delete
lifecycle methods.