We keep hitting issues with AKS and Pulumi. Namely...
# general
s
We keep hitting issues with AKS and Pulumi. Namely, when we try to update properties on existing AKS clusters (managed by Pulumi), we often get into error cases where we end up having to manually edit our stack. Here are some examples: We are unable to update the vm size. pulumi up fails and we have to manually delete the old AKS cluster, edit the stack and then run pulumi up to create the new cluster. Updating maCount on spot instance agent pools. Here’s the error we’re seeing:
Copy code
azure-native:containerservice:AgentPool (dg-gpuspot-agentpool-cpu-staging-dev):
error: Code="PropertyChangeNotAllowed" Message="Changing property 'properties.nodeTaints' is not allowed." Target="properties.nodeTaints"
We weren’t trying to update the node taints. It seems like Pulumi knows it was a change to maxCount:
Copy code
~  azure-native:containerservice:AgentPool dg-gpuspot-agentpool-cpu-staging-dev updating [diff: ~maxCount]; error: Code="PropertyChangeNotAllowed" Message="Changing property 'properties.nodeTaints' is not allowed." Target="properties.nodeTaints"

 ~  azure-native:containerservice:AgentPool dg-gpuspot-agentpool-cpu-staging-dev **updating failed** [diff: ~maxCount]; error: Code="PropertyChangeNotAllowed" Message="Changing property 'properties.nodeTaints' is not allowed." Target="properties.nodeTaints"
Taking a look at the stack output for this resource, I think the issue is that Pulumi is trying to remove the taints that Azure automatically adds to spot node pools. In this case we haven’t been able to get this to work even after editing the stack.
Copy code
{
                "urn": "urn:pulumi:spark-staging::spark-deployment::azure-native:containerservice:AgentPool::dg-gpuspot-agentpool-cpu-staging-dev",
                "custom": true,
                "id": "/subscriptions/a8c3fdb1-94c2-4db4-bc18-470696fa4bd4/resourcegroups/cp-staging/providers/Microsoft.ContainerService/managedClusters/dg-spark-kubernetescluster-cpu-staging-dev/agentPools/gpuspot",
                "type": "azure-native:containerservice:AgentPool",
                "inputs": {
                    "agentPoolName": "gpuspot",
                    "count": 1,
                    "enableAutoScaling": true,
                    "maxCount": 6,
                    "minCount": 0,
                    "mode": "User",
                    "nodeTaints": [
                        "sku=gpu:NoSchedule"
                    ],
                    "resourceGroupName": "cp-staging",
                    "resourceName": "dg-spark-kubernetescluster-cpu-staging-dev",
                    "scaleSetPriority": "Spot",
                    "spotMaxPrice": -1,
                    "type": "VirtualMachineScaleSets",
                    "vmSize": "Standard_NC8as_T4_v3",
                    "vnetSubnetID": "/subscriptions/a8c3fdb1-94c2-4db4-bc18-470696fa4bd4/resourceGroups/cp-staging/providers/Microsoft.Network/virtualNetworks/dg-spark-vnet-cpu-staging-dev/subnets/default"
                },
                "outputs": {
                    "__inputs": {
                        "4dabf18193072939515e22adb298388d": "1b47061264138c4ac30d75fd1eb44270",
                        "ciphertext": "redacted"
                    },
                    "count": 1,
                    "enableAutoScaling": true,
                    "enableFIPS": false,
                    "id": "/subscriptions/a8c3fdb1-94c2-4db4-bc18-470696fa4bd4/resourcegroups/cp-staging/providers/Microsoft.ContainerService/managedClusters/dg-spark-kubernetescluster-cpu-staging-dev/agentPools/gpuspot",
                    "kubeletDiskType": "OS",
                    "maxCount": 6,
                    "maxPods": 110,
                    "minCount": 0,
                    "mode": "User",
                    "name": "gpuspot",
                    "nodeImageVersion": "AKSUbuntu-1804gen2containerd-2021.06.02",
                    "nodeLabels": {
                        "<http://kubernetes.azure.com/scalesetpriority|kubernetes.azure.com/scalesetpriority>": "spot"
                    },
                    "nodeTaints": [
                        "sku=gpu:NoSchedule",
                        "<http://kubernetes.azure.com/scalesetpriority=spot:NoSchedule|kubernetes.azure.com/scalesetpriority=spot:NoSchedule>"
                    ],
                    "orchestratorVersion": "1.19.11",
                    "osDiskSizeGB": 128,
                    "osDiskType": "Ephemeral",
                    "osSKU": "Ubuntu",
                    "osType": "Linux",
                    "powerState": {
                        "code": "Running"
                    },
                    "provisioningState": "Succeeded",
                    "scaleSetEvictionPolicy": "Delete",
                    "scaleSetPriority": "Spot",
                    "spotMaxPrice": -1,
                    "type": "VirtualMachineScaleSets",
                    "vmSize": "Standard_NC8as_T4_v3",
                    "vnetSubnetID": "/subscriptions/a8c3fdb1-94c2-4db4-bc18-470696fa4bd4/resourceGroups/cp-staging/providers/Microsoft.Network/virtualNetworks/dg-spark-vnet-cpu-staging-dev/subnets/default"
b
Sorry you’re hitting this, I think this warrants a bug report, would you mind filing an issue?
g
As jaxxstorm said, this sounds like a bug. As a workaround, can you try adding
ignoreChanges: ["properties.nodeTaints"]
(or the correct resource property name) to see if Pulumi will ignore those and allow the update to proceed?
d
Adding ignoreChanges: [ "nodeTaints" ] didn't work, but adding "count" (which is what was causing the stack to currently think it needed to update the resource) did prevent Pulumi from trying to update the agent pool and failing.
But if we ever want to change the maxCount/etc. we will hit this again unless we also ignore changes on those properties..
g
Understood. That's meant as a workaround for now to hopefully "unstick" you.
Could you open an issue at https://github.com/pulumi/pulumi-azure-native with the code and steps to reproduce this?
s
Pulumi update issues with AKS clusters · Issue #959 · pulumi/pulumi-azure-native (github.com)
Put both issues in the same bug. You can split them if needed.
g
Thank you. I will chat with some people internally.