https://pulumi.com logo
Title
j

jolly-agent-91665

03/01/2023, 4:11 PM
We're having a weird issue where our Pulumi stack previews and deploys fine on a local machine but when running in the GitLab CI both
pulumi preview
and
pulumi up
hang indefinitely and the Pulumi Service shows that the update failed but with no error message. It seems like some sort of disconnection happens near the end of both preview and up. Any tips or help?
Screenshot 2023-03-01 at 16.08.44.png
This is only affecting one of our stacks (environments) within the same project too which is even stranger!
b

billowy-army-68599

03/01/2023, 4:17 PM
this is often due to a stale AWS session token, how do you auth to AWS?
j

jolly-agent-91665

03/01/2023, 4:18 PM
Thanks for the reply @billowy-army-68599! We auth by injecting
AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
as CI variables (which are fixed IAM users access credentials). I've auth'd in the same way locally on MacOS and can't replicate it.
b

billowy-army-68599

03/01/2023, 4:26 PM
where are you executing pulumi?
j

jolly-agent-91665

03/01/2023, 4:27 PM
In CI - using a GitLab runner. Locally - using the MacOS terminal.
Weirdly in CI it completes all of the updates in AWS successfully but then just seems to hang until the Pulumi service marks it as failed.
b

billowy-army-68599

03/01/2023, 4:30 PM
how many resources in t\he stack?
j

jolly-agent-91665

03/01/2023, 4:30 PM
680 resources
b

billowy-army-68599

03/01/2023, 4:35 PM
j

jolly-agent-91665

03/01/2023, 5:22 PM
Struggling to get a performance trace from the CI - I think because the Pulumi command never finishes
I do see this error which appears to come from the Pulumi Service
I0301 17:10:00.972234     232 log.go:71] error renewing lease: [403] The provided update token has expired.
That's when turning debug mode on
These are the last 2 lines of the log:
I0301 17:03:21.512658     232 log.go:71] Marshaling property for RPC[ResourceMonitor.RegisterResource(aws:autoscaling/notification:Notification,production-ecs-1-asg-notifications)]: topicArn={arn:aws:sns:eu-west-1:476250223542:production-ecs-terminations-b11fa72}
I0301 17:10:00.972234     232 log.go:71] error renewing lease: [403] The provided update token has expired.
You can see the 7 minute delay since the last AWS action. The command I'm running is:
timeout 1800 pulumi up --skip-preview --tracing=file:./up.trace --logtostderr --logflow -v=9 2> ./out.txt
I added the timeout as otherwise the
pulumi up
command never exits.
b

billowy-army-68599

03/01/2023, 5:25 PM
could you send a support issue to support@pulumi.com - ignore the automated response
j

jolly-agent-91665

03/01/2023, 5:40 PM
Will do now!
Done! #2457
We're at the point where we're considering an Enterprise plan because of the number of resources we have. This is bad timing as we should have done it before this happened - hindsight!
b

billowy-army-68599

03/01/2023, 5:46 PM
I see you’re using an individual account, what stage of the process are you in?
j

jolly-agent-91665

03/01/2023, 5:46 PM
We've not kicked it off yet
b

billowy-army-68599

03/01/2023, 5:46 PM
good to know, i’ve taken the ticket and will take a look behind the scenes
j

jolly-agent-91665

03/01/2023, 5:47 PM
Thankyou - at the end of this we should go through a pricing discussion
b

billowy-army-68599

03/01/2023, 5:49 PM
could you send a separate email to lbriggs[at]pulumi.com for that?
we’ll get this sorted before we have that chat
j

jolly-agent-91665

03/01/2023, 5:50 PM
Have done & thanks.
b

billowy-army-68599

03/01/2023, 6:18 PM
@jolly-agent-91665 quick q: is this an ongoing problem or did it start recently?
j

jolly-agent-91665

03/01/2023, 6:19 PM
This started since we created a new set of stacks for our environments. It is only affecting our production stack though and only on GitLab
It runs fine on MacOS so perhaps it's a resource thing? But it seems strange when it only affects this stack and the resources are (almost) in parity between staging, sandbox and production.
b

billowy-army-68599

03/01/2023, 6:22 PM
are the number of resources in staging/production and sandbox the same?
j

jolly-agent-91665

03/01/2023, 6:23 PM
30 resources less on both of those as they omit an SSH Bastion.
Otherwise the exact same
b

billowy-army-68599

03/01/2023, 6:25 PM
can you do a deployment with an SSH bastion, just to eliminate a theory I have
j

jolly-agent-91665

03/01/2023, 6:26 PM
Onto the staging stack?
b

billowy-army-68599

03/01/2023, 6:26 PM
or sandbox, whichever is preferable
j

jolly-agent-91665

03/01/2023, 6:26 PM
Will do now
I'm running it now - I don't know if this is helpful but I have an active preview running against the production stack (https://app.pulumi.com/will/infrastructure/production/previews/c89c78d5-37e2-4d68-a893-5ceaa7537a35) and that's hanging in the same way for >30 minutes but it's hanging on both the Pulumi Service and the
pulumi preview
command.
b

billowy-army-68599

03/01/2023, 6:36 PM
i have an engineer investigating
okay, I think this is likely the same as: https://github.com/pulumi/pulumi/issues/7094 can you try setting the environment variables:
export PULUMI_EXPERIMENTAL=1
export PULUMI_SKIP_CHECKPOINTS=1
export PULUMI_OPTIMIZED_CHECKPOINT_PATCH=1
j

jolly-agent-91665

03/01/2023, 6:43 PM
Will try that now
That didn't fix it (for the preview at least).
What we have done is double the instance sizes on GitLab that runs the CI job and that fixed it
b

billowy-army-68599

03/01/2023, 7:30 PM
Okay that’s good information. We are having a lovely discussion behind the scenes on this, if we have any concrete fixes we’ll let you know
j

jolly-agent-91665

03/01/2023, 7:30 PM
I had a quick scan of the GitHub issue and it does seem like it could be related given that we have a comparable number of resources.
Thankyou - I'm intrigued to say the least
b

billowy-army-68599

03/01/2023, 8:20 PM
@jolly-agent-91665 could you run an update locally, and capture a: • performance trace • verbose logging • a profile of cpu/mem which can be captured using
--profiling
you can use the support ticket to send them, DM me if there’s any issues sending this over securely
j

jolly-agent-91665

03/01/2023, 8:28 PM
Yeah I'll grab these tomorrow (it's 8pm here!)
b

billowy-army-68599

03/01/2023, 8:28 PM
apreciate it! thanks!