https://pulumi.com logo
#automation-api
Title
# automation-api
c

clever-cartoon-41433

04/19/2021, 6:46 PM
We have a game-breaking bug with pulumi automation where every once in a while the apply will hang with a few "unhandled rejection" errors. Normally we would just retry in these cases but the hanging makes it completely block further use of our application. Not quite sure if this is my issue or yours.
r

red-match-15116

04/19/2021, 6:51 PM
What version are you on? I know @lemon-agent-27707 recently made some fixes in this area.
l

lemon-agent-27707

04/19/2021, 6:52 PM
And please provide any detail on the error, related code, etc, that you can.
c

clever-cartoon-41433

04/19/2021, 7:01 PM
@lemon-agent-27707 Error always looks like this:
Copy code
unhandled rejection: CONTEXT(127): resource:svg-xss-44dfd983-b768-4c01-ab49-4315ec88a39f[kubernetes:apps/v1:Deployment]
STACK_TRACE:
Error
    at Object.debuggablePromise (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/runtime/debuggable.js:69:75)
    at Object.registerResource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/runtime/resource.js:219:18)
    at new Resource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/resource.js:215:24)
    at new CustomResource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/resource.js:307:9)
    at new Deployment (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/apps/v1/deployment.ts:109:9)
    at new Deployment (/app/node_modules/@pulumi/kubernetesx/kx.js:166:9)
    at /app/index.ts:157:41
    at Object.exam (/app/index.ts:368:6)
    at Stack.<anonymous> (/app/node_modules/@pulumi/pulumi/runtime/stack.js:86:43)
    at Generator.next (<anonymous>)
It's outputted to stderr and the apply will continue until it hangs. If I restart pulumi, it will usually work.
Using pulumi 2.25.
l

lemon-agent-27707

04/19/2021, 7:03 PM
1. What version of pulumi CLI and
@pulumi/pulumi
are you using? 2. Can you share some of your code? That error points to this line:
at /app/index.ts:157:41
c

clever-cartoon-41433

04/19/2021, 7:03 PM
Here's the specific resource:
Copy code
const svg_xss = new k8x.Deployment("svg-xss-" + id, {
    metadata: {namespace: kubernetesWorkspace.metadata.name},
    spec: new k8x.PodBuilder({
        automountServiceAccountToken: false,
        imagePullSecrets: [{
            name: dockercfg.metadata.name,
        }],
        containers: [
            {
                env: {
                    PORT: svg_xss_port.toString(),
                    FLAG: task.flag,
                    SESSION_KEY: svg_xss_session_key.result,
                },
                name: "svg-xss-server-" + id,
                image: toContainerURL("svg_xss_server"),
                ports: {
                    http: svg_xss_port,
                },
            },
        ]
    }).asDeploymentSpec()
})
Copy code
function toContainerURL(containerName: string): pulumi.Output<string> {
    return registryEndpoint.apply(host => {
        return host + "/" + toContainerPath(containerName) + ":latest"
    })
}
Right out of our package.json:
Copy code
"dependencies": {
    "@jsier/retrier": "^1.2.4",
    "@pulumi/digitalocean": "^3.8.0",
    "@pulumi/dnsimple": "^2.7.0",
    "@pulumi/kubernetes": "^2.9.1",
    "@pulumi/kubernetesx": "^0.1.4",
    "@pulumi/pulumi": "^2.25.0",
    "@pulumi/random": "^3.2.0",
    "axios": "^0.21.1",
    "express": "^4.17.1",
    "reattempt": "^0.1.1",
    "redis": "^3.1.1",
    "rwlock": "^5.0.0"
}
l

lemon-agent-27707

04/19/2021, 7:06 PM
Can you share the full error output?
c

clever-cartoon-41433

04/19/2021, 7:07 PM
That is literally the entire error. It's repeated twice:
Copy code
unhandled rejection: CONTEXT(127): resource:svg-xss-44dfd983-b768-4c01-ab49-4315ec88a39f[kubernetes:apps/v1:Deployment]
STACK_TRACE:
Error
    at Object.debuggablePromise (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/runtime/debuggable.js:69:75)
    at Object.registerResource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/runtime/resource.js:219:18)
    at new Resource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/resource.js:215:24)
    at new CustomResource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/resource.js:307:9)
    at new Deployment (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/apps/v1/deployment.ts:109:9)
    at new Deployment (/app/node_modules/@pulumi/kubernetesx/kx.js:166:9)
    at /app/index.ts:157:41
    at Object.exam (/app/index.ts:368:6)
    at Stack.<anonymous> (/app/node_modules/@pulumi/pulumi/runtime/stack.js:86:43)
    at Generator.next (<anonymous>)
unhandled rejection: CONTEXT(127): resource:svg-xss-44dfd983-b768-4c01-ab49-4315ec88a39f[kubernetes:apps/v1:Deployment]
STACK_TRACE:
Error
    at Object.debuggablePromise (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/runtime/debuggable.js:69:75)
    at Object.registerResource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/runtime/resource.js:219:18)
    at new Resource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/resource.js:215:24)
    at new CustomResource (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/pulumi/resource.js:307:9)
    at new Deployment (/app/node_modules/@pulumi/kubernetesx/node_modules/@pulumi/apps/v1/deployment.ts:109:9)
    at new Deployment (/app/node_modules/@pulumi/kubernetesx/kx.js:166:9)
    at /app/index.ts:157:41
    at Object.exam (/app/index.ts:368:6)
    at Stack.<anonymous> (/app/node_modules/@pulumi/pulumi/runtime/stack.js:86:43)
    at Generator.next (<anonymous>)
But there's no "full error". Like I said, it hangs.
The apply keeps on running for a bit whenever this happens, and then never finshes.
l

lemon-agent-27707

04/19/2021, 7:07 PM
c

clever-cartoon-41433

04/19/2021, 7:08 PM
eh 1s
Copy code
while (true) {
    try {
        await stack.setConfig("input", {
            value: queuedInput,
            secret: false,
        })
        console.log("New state: " + queuedInput)
        console.log("Performing apply...")
        const applyOutput = await stack.up({
            onOutput: function (output) {
                console.log("[PULUMI]: " + output)
            },
        })
        console.log("Finished apply")
        appliedOutput = applyOutput.outputs
        appliedConfigTime = recvTime
        release()
        return
    } catch (err) {
        console.log("Error encountered during apply: " + err)
        try {await stack.cancel()} catch {} //This can err, in case no apply needed to be canceled
    }
}
The "Error encountered during apply" bit never executes, so I assume it's not throwing an exception.
If it helps, as far as I can remember it's always been a Deployment that the unhandled rejection error points to specifically.
l

lemon-agent-27707

04/19/2021, 7:12 PM
This is a consistent error, or only happens sometimes?
c

clever-cartoon-41433

04/19/2021, 7:13 PM
Only happens sometimes. Usually one or two applies after the first apply. I don't think I've ever seen it happen on startup.
l

lemon-agent-27707

04/19/2021, 7:16 PM
When did this start happening? Does this happen with earlier versions as well, or only after a recent upgrade.
c

clever-cartoon-41433

04/19/2021, 7:17 PM
We upgraded in an attempt to fix this
l

lemon-agent-27707

04/19/2021, 7:17 PM
I see, so this has always happened?
c

clever-cartoon-41433

04/19/2021, 7:18 PM
Yes
l

lemon-agent-27707

04/19/2021, 7:23 PM
The apply keeps on running for a bit whenever this happens, and then never finshes.
What do you mean by this. How did you observe this?
c

clever-cartoon-41433

04/19/2021, 7:24 PM
Well, it neither throws an exceptions or hits the release()/return block.
or outputs console.log("Finished apply")
So I assumed that meant the up() call was still running
l

lemon-agent-27707

04/19/2021, 7:29 PM
What version of node are you running, and can you share these two lines of code (line numbers labeled if possible):
Copy code
at /app/index.ts:157:41
    at Object.exam (/app/index.ts:368:6)
By any chance are you using our kubernetes helm support in your program?
c

clever-cartoon-41433

04/19/2021, 7:44 PM
I am not using kubernetes helm. 368 is just
}()
, the end of hte pulumi program definition. 157 is the starting line of the Deployment I showed you earlier:
const svg_xss = new k8x.Deployment("svg-xss-" + id, {
l

lemon-agent-27707

04/19/2021, 7:52 PM
c

clever-cartoon-41433

04/19/2021, 8:03 PM
sure.
eh 1s actually
this goes through CI so
I don't actually have the ability to do that
Let me figure this out
l

lemon-agent-27707

04/19/2021, 8:07 PM
Before we go any deeper there, would you mind opening an issue that summarizes the problem? github.com/pulumi/pulumi/issues/new
c

clever-cartoon-41433

04/19/2021, 8:27 PM
l

lemon-agent-27707

04/19/2021, 9:17 PM
I notice that your stack trace includes nested version of
@pulumi/pulumi
one at the top level, and one nested under pulumi/kubernetesx. Do you have a package.lock file or yarn.lock file so that we could see what versions are being resolved?
Might also be worth doing a run with excessive debug output set to
true
unfortunately this requires modifying js source of that variable: https://github.com/pulumi/pulumi/blob/036344679a0c2e24af3c9e4a0d7ea34adff5136b/sdk/nodejs/runtime/settings.ts#L35
Does pulumi still think the update is running at this point? Or has the update completed?
c

clever-cartoon-41433

04/19/2021, 10:38 PM
Sorry; internet went out
Here it is
@lemon-agent-27707 Pulumi continues to think the update is running and the pod has to cancel it upon restart. It hangs for at least 24 hours if no one manualy `kubectl rollout restart`'s
l

lemon-agent-27707

04/19/2021, 10:45 PM
Ok, looks like both versions of pulumi/pulumi are 2.25. Would be helpful to see the excessive debug output when you get a moment. Also out of curiosity, what version of node are you running?
c

clever-cartoon-41433

04/19/2021, 10:59 PM
I'm gonna modify the dockerfile so it uses a custom node_modules dir rn
1s
@lemon-agent-27707 Hey, so I'm finally getting around to testing it like you said... We upgraded to pulumi 3.0, and I removed the process.on() block you mentioned. The issue seems to be gone.
woohoo 1
I can't tell whether or not it's because we upgraded, or because of the code change I made
r

red-match-15116

04/20/2021, 9:21 PM
Great news! Curious to see what you’d encounter without the code change to figure out whether it’s the upgrade or the code change, but my bet is on the code change because I don’t believe we changed anything in 3.0 that would help here.
l

lemon-agent-27707

04/20/2021, 11:03 PM
Ah, interesting.
Can you let me know if you still see this issue without modifying the uncaught handler? If that is definitively the issue I will put together a patch that removes it so you don't have to maintain a fork.
c

clever-cartoon-41433

04/23/2021, 11:23 PM
@lemon-agent-27707 It's definitively the patch we made.
l

lemon-agent-27707

04/24/2021, 1:12 AM
Can you try to precisely describe the describe the difference in behavior before and after the patch? Without the patch your program fails and with it, it completes successfully?
Or is it that without the patch your catch executes?
c

clever-cartoon-41433

04/24/2021, 1:18 AM
Without the patch, every so often, pulumi applies hang without erroring and must be canceled and restarted. With the patch, it doesn't, and resources are created successfully.
l

lemon-agent-27707

04/24/2021, 2:46 AM
It sounds like now in some cases your
catch
block is actually executing? Is that true, or did the failures go away altogether?
c

clever-cartoon-41433

04/24/2021, 10:34 PM
Failures went away altogether, I believe. But I'm not sure. I can check next time I see the logs.
I take that back. It's finally propagating back up, and it turned out to be a bug in our code - but one that we can fix, because we got the real error message this time.
🎉 1
l

lemon-agent-27707

04/26/2021, 3:24 PM
Ah, glad to hear. Thanks for the report!
2 Views