This message was deleted Pulumi Community #azure

Join Slack

This message was deleted.

# azure

sparse-intern-71089

10/17/2023, 7:51 PM

This message was deleted.

billowy-army-68599

10/17/2023, 7:52 PM

it seems you’re defining the nodepools inline. Don’t do that, add it as a distinct resource using https://www.pulumi.com/registry/packages/azure-native/api-docs/containerservice/agentpool/

billowy-army-68599

10/17/2023, 7:52 PM

this is a limitation of the Azure API

✅ 1

acceptable-diamond-84047

10/17/2023, 7:53 PM

gotcha, thanks!

acceptable-diamond-84047

10/17/2023, 8:14 PM

@billowy-army-68599 how do you generally organize the systempool formation? do you pass it inline, or add it as a different resource as well - atm I am running into a few issues mixing the two as pulumi tries to replace the whole cluster

billowy-army-68599

10/17/2023, 8:15 PM

the system pool must be defined inline

billowy-army-68599

10/17/2023, 8:16 PM

all other pools need to be distinct resources

billowy-army-68599

10/17/2023, 8:16 PM

generally, I create an AKS cluster with a single system pool and then don’t do any other resource inline

👀 1

acceptable-diamond-84047

10/17/2023, 8:19 PM

makes sense, I guess I'm just blocked by our current deployment because recreating the nodepools is forcing replacement of the entire cluster - any ideas around it?

billowy-army-68599

10/17/2023, 8:24 PM

you mean the line nodepools?

billowy-army-68599

10/17/2023, 8:24 PM

do you have a before and after diff of your code?

acceptable-diamond-84047

10/17/2023, 8:27 PM

Sure, this is the before:

Copy code

# Create a Kubernetes cluster
    k8s_cluster = containerservice.ManagedCluster(
        f"ml-main-{stack_name}",
        location=resource_group.location,
        resource_group_name=resource_group.name,
        agent_pool_profiles=[
            # System Node Pool
            containerservice.ManagedClusterAgentPoolProfileArgs(
                name="systempool",
                mode="System",
                os_disk_size_gb=30,
                count=1,
                os_type="Linux",
                vm_size="standard_b2pls_v2",
                vnet_subnet_id=subnet1.id,
                type="VirtualMachineScaleSets",
            ),
            containerservice.ManagedClusterAgentPoolProfileArgs(
                name="gpunodepool",
                mode="User",
                os_type="Ubuntu",
                scale_set_priority="Regular",
                vm_size="standard_nc6s_v3",  # GPU enabled VM
                node_labels={"gpu": "true"},
                vnet_subnet_id=subnet1.id,
                type="VirtualMachineScaleSets",
                node_taints=["gpu=true:NoSchedule"],
                **stack_gpu_autoscaler_settings[stack_name],
            ),
        ],
        dns_prefix=f"ml-main-{stack_name}",
        enable_rbac=True,
        linux_profile={
            "admin_username": "someAdmin",
            "ssh": {
                "publicKeys": [
                    {
                        "keyData": AKS_SSH_PUBKEY,
                    }
                ]
            },
        },
        service_principal_profile=containerservice.ManagedClusterServicePrincipalProfileArgs(
            client_id=app.application_id,
            secret=sp_password.value,
        ),
        network_profile=containerservice.ContainerServiceNetworkProfileArgs(
            network_plugin="azure",
            network_policy="azure",
            service_cidr="10.96.0.0/16",
            dns_service_ip="10.96.0.10",
        ),
    )

And after:

Copy code

# Create a Kubernetes cluster
    k8s_cluster = containerservice.ManagedCluster(
        f"ml-main-{stack_name}",
        location=resource_group.location,
        resource_group_name=resource_group.name,
        agent_pool_profiles=[
            # System Node Pool
            containerservice.ManagedClusterAgentPoolProfileArgs(
                name="systempool",
                mode="System",
                os_disk_size_gb=30,
                count=1,
                os_type="Linux",
                vm_size="standard_b2pls_v2",
                vnet_subnet_id=subnet1.id,
                type="VirtualMachineScaleSets",
            ),
        ],
        dns_prefix=f"ml-main-{stack_name}",
        enable_rbac=True,
        linux_profile={
            "admin_username": "someAdmin",
            "ssh": {
                "publicKeys": [
                    {
                        "keyData": AKS_SSH_PUBKEY,
                    }
                ]
            },
        },
        service_principal_profile=containerservice.ManagedClusterServicePrincipalProfileArgs(
            client_id=app.application_id,
            secret=sp_password.value,
        ),
        network_profile=containerservice.ContainerServiceNetworkProfileArgs(
            network_plugin="azure",
            network_policy="azure",
            service_cidr="10.96.0.0/16",
            dns_service_ip="10.96.0.10",
        ),
    )

gpu_nodepool =  containerservice.AgentPool(
                "gpu_nodepool",
                agentpool_name="gpunodepool",
                mode="User",
                os_type="Ubuntu",
                scale_set_priority="Regular",
                vm_size="standard_nc6s_v3",
                node_labels={"cpu": "true"},
                vnet_subnet_id=subnet1.id,
                type="VirtualMachineScaleSets",
                node_taints=["cpu=true:NoSchedule"],
                **stack_gpu_autoscaler_settings[stack_name],
                _resource_name_=k8s_cluster.name
                resource_group_name=resource_group.name.
)

just moved the GPU nodepool out, and its replacing the whole cluster

billowy-army-68599

10/17/2023, 8:29 PM

Yep, you can’t modify those inline nodepools at all.

billowy-army-68599

10/17/2023, 8:30 PM

Even removing them will force a recreate, sadly. https://github.com/pulumi/pulumi-azure-native/issues/579

acceptable-diamond-84047

10/17/2023, 8:33 PM

That's pretty frustrating, imo the docs should be updated to reflect this

acceptable-diamond-84047

10/17/2023, 8:34 PM

Also now that I have tried this, reverting the change completely still is forcing a recreate, even after deleting the nodepool manually and running a refresh

acceptable-diamond-84047

10/17/2023, 8:34 PM

Any way I can find the exact reason why a recreate is being triggered?

billowy-army-68599

10/17/2023, 8:35 PM

again, this is a limitation on the azure side. I agree it’s frustrating, but there’s not a whole lot we can do. The azure API doesn’t even publish docs on this

acceptable-diamond-84047

10/17/2023, 8:37 PM

Totally understand. Atm, I'm just reverting everything to the way it was (no new nodepools, pulumi config is inline as before, azure infra matches the config) - jsut trying to figure out why

pulumi refresh

still prompts a recreate

acceptable-diamond-84047

10/17/2023, 8:39 PM

seems like its just a nodepool ordering thing from the api

acceptable-diamond-84047

10/17/2023, 9:01 PM

I think I managed to find a workaround: • Add

_opts_=pulumi.ResourceOptions(_ignore_changes_=["agent_pool_profiles"]),

to the cluster resource, and remove inline agent pools • Create separate Agent pool resources • try

pulumi up

- this will inevitably fail because of "existing resources" • Use

pulumi import azure-native:containerservice:AgentPool <nodepoolResourceName> /subscriptions/<sid>/resourceGroups/<rg_name>/providers/Microsoft.ContainerService/managedClusters/<clusterName>/agentPools/<existingNodepoolName>

for each existing nodepool • Refresh, and you are good to go

billowy-army-68599

10/17/2023, 9:05 PM

that looks like a good approach!

acceptable-diamond-84047

10/17/2023, 9:17 PM

fingers crossed it doesn't break anything that'll come back to bite me haha, but for now it looks okay

acceptable-diamond-84047

10/22/2023, 8:38 PM

bumping this because I just noticed something funky wrt networking with the new AgentPool format. nodepools are created successfully + they're in the same vnet/subnet, but the healthcheck for the new cpu nodepool doesn't work and the Ingress throws 502s for any services on the new nodepool. port forwarding everything works locally so i know the services are running correctly. do you do any networking config beyond setting the same vnet + subnet for the agentpools as the cluster?

15 Views

Open in Slack

Previous Next