Latest code change was 17 hours ago and had 2 cons...
# general
e
Latest code change was 17 hours ago and had 2 consecutive test successes over 6 hours, then failures started about 2 hours ago.
g
Can you elaborate on what you're tests are doing?
e
-
pulumi stack select
to choose our dedicated CD testing stack -
pulumi up
to create an EKS cluster - run various tests against the new cluster -
pulumi destroy
to destroy all created resources in the stack It's a test that runs every 3 hours to validate our eks cluster creation. It was passing consistently until the bug in the eks and/or kubernetes packages from 1-2 days ago. After resolving those it start passing again until a few hours ago. No code changes on our end between the successful runs and the failures
We had about 9 hours of success (3 passing runs) and then they started failing
g
do you have your pulumi dependencies pinned to specific versions?
e
yes
Copy code
cat package.json
{
  "name": "eks-cluster",
  "devDependencies": {
    "@types/node": "10.12.19"
  },
  "dependencies": {
    "@pulumi/aws": "0.16.7",
    "@pulumi/eks": "0.16.6",
    "@pulumi/kubernetes": "0.19.0",
    "@pulumi/pulumi": "0.16.12"
  }
}
similarly with node/npm
w
Investigating this now. I can’t see any clear reason you’d be able to trigger an error here. You mention it error’d twice - have you had successes as well since then (that is, is it consistently failing now, or just sometimes?).
e
It's consistently failing
Just had more repros over the weekend and this morning
w
ok - thanks for confirming - looking into it...
e
I'm trying double quotes on all the config values to see if that fixes it. online yaml validator says it's fine, though
w
I opened https://github.com/pulumi/pulumi-eks/issues/58 to track. Unfortunately I have not been able to repro myself - so I can't debug it myself easily.
The issue is occuring here: https://github.com/pulumi/pulumi-eks/blob/e092b68400c5590ee9127e4288d6f271339b8ec3/nodejs/eks/cluster.ts#L216 You could try adding a
console.log
of
mappings
and
insanceMapping
right before the return in the copy of that code in your
node_modules
to find out what values are failing to serialize.
Also - just to check - what
roleMappings
(if any) are you passing to your
eks.Cluster
?
e
I'll try logging it out
here's the role mappings hard-coded:
Copy code
roleMappings      : [
    // This role mapping provides full administrator cluster access to the k8s cluster
    {
      groups    : ["system:masters"],
      roleArn   : clusterAdminRoleArn,
      username  : "system-admin",
    },
    // This role mapping provides automation access to the k8s cluster, e.g. gitlab CI
    {
      groups    : ["system:masters"],
      roleArn   : clusterAutomationAccessRoleArn,
      username  : "svc-automation",
    },
    // This role mapping provides devs that have assumed into the mustang cluster role
    // access to the k8s cluster
    {
      groups    : ["system:masters"],
      roleArn   : clusterAccessRoleArn,
      username  : "dev",
    },
  ],
@white-balloon-205 the console.log output shows this:
Copy code
mappings: [{"groups":["system:masters"],"username":"system-admin"},{"groups":["system:masters"],"roleArn":"arn:aws:iam::009348887430:role/mustang-sandbox-automation-role-f39b756","username":"svc-automation"},{"groups":["system:masters"],"roleArn":"arn:aws:iam::009348887430:role/mustang-sandbox-k8s-cluster-role-30d65f2","username":"dev"}]
    instanceMapping: {"roleArn":"arn:aws:iam::009348887430:role/eks-cluster-cd-test-eks-cluster-instanceRole-role-45e8a8d","username":"system:node:{{EC2PrivateDNSName}}","groups":["system:bootstrappers","system:nodes"]}
w
And then after that
console.log
you see the failure?
e
yeah
I tried this:
Copy code
const roleMappings = pulumi.all([pulumi.output(args.roleMappings || []), instanceRoleMapping])
        .apply(([mappings, instanceMapping]) => {
            console.log("\n");
            console.log("mappings: "+JSON.stringify(mappings));
            console.log("\n");
            console.log("instanceMapping: "+JSON.stringify(instanceMapping));
            console.log("\n");
        let temp = jsyaml.safeDump([...mappings, instanceMapping].map(m => ({
            rolearn: m.roleArn,
            username: m.username,
            groups: m.groups,
        })));
        console.log("\nTEST-A\n");
        return temp;
    });
Here's the output
never hits "TEST-A"
Copy code
mappings: [{"groups":["system:masters"],"username":"system-admin"},{"groups":["system:masters"],"roleArn":"arn:aws:iam::009348887430:role/mustang-sandbox-automation-role-f39b756","username":"svc-automation"},{"groups":["system:masters"],"roleArn":"arn:aws:iam::009348887430:role/mustang-sandbox-k8s-cluster-role-30d65f2","username":"dev"}]
    instanceMapping: {"roleArn":"arn:aws:iam::009348887430:role/eks-cluster-cd-test-eks-cluster-instanceRole-role-45e8a8d","username":"system:node:{{EC2PrivateDNSName}}","groups":["system:bootstrappers","system:nodes"]}
I see multiple versions of js-yaml specified in the
@pulumi/eks
w
(sorry for trying to remote debug this - I've tried to repro myself in a few ways but cannot) Could you try rewriting this:
Copy code
return jsyaml.safeDump([...mappings, instanceMapping].map(m => ({
                rolearn: m.roleArn,
                username: m.username,
                groups: m.groups,
            })));
To this:
Copy code
let arr = [...mappings, instanceMapping].map(m => ({
                rolearn: m.roleArn,
                username: m.username,
                groups: m.groups,
            }));
console.log(arr);
return jsyaml.safeDump(arr);
Leaving off the
JSON.stringify
is intentional - that will hide where the `undefined`s are.
e
output:
Copy code
[ { rolearn: undefined,
        username: 'system-admin',
        groups: [ 'system:masters' ] },
      { rolearn:
         'arn:aws:iam::009348887430:role/mustang-sandbox-automation-role-f39b756',
        username: 'svc-automation',
        groups: [ 'system:masters' ] },
      { rolearn:
         'arn:aws:iam::009348887430:role/mustang-sandbox-k8s-cluster-role-30d65f2',
        username: 'dev',
        groups: [ 'system:masters' ] },
      { rolearn:
         'arn:aws:iam::009348887430:role/eks-cluster-cd-test-eks-cluster-instanceRole-role-45e8a8d',
        username: 'system:node:{{EC2PrivateDNSName}}',
        groups: [ 'system:bootstrappers', 'system:nodes' ] } ]
oh, look at that!
rolearn is undefined
w
Right - so that’s where the problem is. Where is that arn coming from?
e
It's a stack reference. Someone had change the stack to rename the export 😞
I've refreshed and re-`up`'ed the reference
it resolved the issue locally, but I'll check that the automation picks it up too
I think this may be a good spot to add validation for
undefined
values and add logging on them
👍 1
w
Great to hear this identified the issue. There’s a couple things I think we can do to help out more here in general.
e
Would be good to notify or throw errors when stack references are undefined, or exports don't exist with a particular name, too
FYI - just validated our E2E test on this is resolved.
So I think that was the whole issue - an export was renamed in a different stack and then tried to get pulled in as a reference. Thanks for taking the time to help debug it!
Hopefully you got some good ideas from the exercise
w
Definitely!