This message was deleted.
# aws
s
This message was deleted.
b
can you share your code?
p
We make this call several times with few different sets of settings. There's more code but I'm not sure how much I can share.
Copy code
eks.ManagedNodeGroup(
                ng_name,
                node_role_arn=ec2_role.arn,
                cluster=cluster.core,
                ami_type='CUSTOM' if ami_id else None,
                # Instance type will be specified in the launch template
                # instance_types=[node_type],
                scaling_config=aws.eks.NodeGroupScalingConfigArgs(
                    desired_size=max(node['min_node_count'], 1),
                    min_size=node['min_node_count'],
                    max_size=node['max_node_count']),
                subnet_ids=ng_subnet_ids,
                # The node version determines the AMI id, if AMI id already specified no need for node version
                version=node_version if not ami_id else None,
                tags=ng_tags,
                labels=node_labels,
                launch_template=aws.eks.NodeGroupLaunchTemplateArgs(
                    id=template.id,
                    version=template.latest_version,
                ),
                capacity_type=node.get('capacity_type', 'ON_DEMAND'),
                taints=[
                    aws.eks.NodeGroupTaintArgs(effect=taint.get('effect'), key=taint.get('key'),
                                               value=taint.get('value'))
                    for taint in node.get('taints', [])
                ],
                opts=ResourceOptions(ignore_changes=["scalingConfig.desiredSize"]),
            )
b
these two lines look suspicious:
Copy code
capacity_type=node.get('capacity_type', 'ON_DEMAND'),
                taints=[
                    aws.eks.NodeGroupTaintArgs(effect=taint.get('effect'), key=taint.get('key'),
                                               value=taint.get('value'))
                    for taint in node.get('taints', [])
                ],
have they ever worked? if you comment them out does it work?
p
Those lines have worked in the past. Is there some way debug this?
b
not super easily, something in your code is sending incorrect grpc messages to the engine
there’s a property that’s undefined basically. you’ll have to comment or print properties or use a debugger
p
Is there a way to find the underlying proto to see what "map" is?
b
what do you mean?
p
Is it possible to figure out what field might be set incorrectly here? In terms of debug logging I'm not sure how to enable it using the python SDK.
b
we are having an internal discussion about ways to improve this, but the underlying component is https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts anywhere there’s a call to
map
may be the issue, so maybe look at the
tags
input? https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L1313 so likely the value of
ng_tags
p
I can share a bit more code I don't think tags are the issue. Several other instance groups get setup without an issue.
Copy code
ng_zones_list = list(map(lambda z: [z], node.get('node_zones', [])))
        ng_multizones_lists = node.get('multizone_nodegroup_zones', [])
        is_multizone = bool(ng_multizones_lists)
        # if specified, use multizone_nodegroup_zones, else use node_zones (with 1 zone per nodegroup)
        for node_zone_list in (ng_multizones_lists if is_multizone else ng_zones_list):
            # include first subnet name in nodegroup name (even if multizone)
            zone = node_zone_list[0]
            sn_name = subnet_names[zone]
            ng_name = apply_name_overrides(f"{node['group_name']}{'-mz' if is_multizone else ''}-{sn_name}")
            ng_subnet_ids = list(map(lambda z: subnet_ids[z], node_zone_list))
            ng_tags = cluster.eks_cluster.name.apply(lambda cname: {
                "<http://k8s.io/cluster-autoscaler/enabled|k8s.io/cluster-autoscaler/enabled>": "true",
                f"<http://k8s.io/cluster-autoscaler/{cname}|k8s.io/cluster-autoscaler/{cname}>": "true",
                **tags,
                **({'Name': ng_name} if tags else {})
            })

            default_capacity_type = 'spot' if node.get('capacity_type') == 'SPOT' else 'ondemand'
            node_labels[CAPACITY_TYPE_LABEL] = node_labels.get(CAPACITY_TYPE_LABEL, default_capacity_type)

            eks.ManagedNodeGroup(
                ng_name,
                node_role_arn=ec2_role.arn,
                cluster=cluster.core,
                ami_type='CUSTOM' if ami_id else None,
                # Instance type will be specified in the launch template
                # instance_types=[node_type],
                scaling_config=aws.eks.NodeGroupScalingConfigArgs(
                    desired_size=max(node['min_node_count'], 1),
                    min_size=node['min_node_count'],
                    max_size=node['max_node_count']),
                subnet_ids=ng_subnet_ids,
                # The node version determines the AMI id, if AMI id already specified no need for node version
                version=node_version if not ami_id else None,
                tags=ng_tags,
                labels=node_labels,
                launch_template=aws.eks.NodeGroupLaunchTemplateArgs(
                    id=template.id,
                    version=template.latest_version,
                ),
                capacity_type=node.get('capacity_type', 'ON_DEMAND'),
                taints=[
                    aws.eks.NodeGroupTaintArgs(effect=taint.get('effect'), key=taint.get('key'),
                                               value=taint.get('value'))
                    for taint in node.get('taints', [])
                ],
                opts=ResourceOptions(ignore_changes=["scalingConfig.desiredSize"]),
            )
I'm not able to run a debugger super easily but I can println debug.
Some additional context: That nodegroup got created and is healthy in AWS. I think this is a Pulumi bug.
b
More than likely, if you could file an issue if you have a reliable repro that’d be great
p
Are there any possible workarounds you could recommend?
I checked and this appears to be the relevant call. https://www.npmjs.com/package/@pulumi/eks?activeTab=code
Copy code
// Check that the nodegroup role has been set on the cluster to
    // ensure that the aws-auth configmap was properly formed.
    const nodegroupRole = pulumi.all([core.instanceRoles, roleArn]).apply(([roles, rArn]) => {
        // Map out the ARNs of all of the instanceRoles.
        const roleArns = roles.map((role) => {
            return role.arn;
        });
        // Try finding the nodeRole in the ARNs array.
        return pulumi.all([roleArns, rArn]).apply(([arns, arn]) => {
            return arns.find((a) => a === arn);
        });
    });
Some more context, running pulumi refresh causes a repro of this.
314 Views