This message was deleted Pulumi Community #general

Join Slack

This message was deleted.

# general

sparse-intern-71089

03/01/2023, 4:11 PM

This message was deleted.

jolly-agent-91665

03/01/2023, 4:12 PM

This is only affecting one of our stacks (environments) within the same project too which is even stranger!

billowy-army-68599

03/01/2023, 4:17 PM

this is often due to a stale AWS session token, how do you auth to AWS?

jolly-agent-91665

03/01/2023, 4:18 PM

Thanks for the reply @billowy-army-68599! We auth by injecting

AWS_ACCESS_KEY_ID

and

AWS_SECRET_ACCESS_KEY

as CI variables (which are fixed IAM users access credentials). I've auth'd in the same way locally on MacOS and can't replicate it.

billowy-army-68599

03/01/2023, 4:26 PM

where are you executing pulumi?

jolly-agent-91665

03/01/2023, 4:27 PM

In CI - using a GitLab runner. Locally - using the MacOS terminal.

jolly-agent-91665

03/01/2023, 4:28 PM

Weirdly in CI it completes all of the updates in AWS successfully but then just seems to hang until the Pulumi service marks it as failed.

billowy-army-68599

03/01/2023, 4:30 PM

how many resources in t\he stack?

jolly-agent-91665

03/01/2023, 4:30 PM

680 resources

billowy-army-68599

03/01/2023, 4:35 PM

could you run a performance trace: https://www.pulumi.com/docs/support/troubleshooting/#performance

jolly-agent-91665

03/01/2023, 5:22 PM

Struggling to get a performance trace from the CI - I think because the Pulumi command never finishes

jolly-agent-91665

03/01/2023, 5:23 PM

I do see this error which appears to come from the Pulumi Service

I0301 17:10:00.972234     232 log.go:71] error renewing lease: [403] The provided update token has expired.

jolly-agent-91665

03/01/2023, 5:23 PM

That's when turning debug mode on

jolly-agent-91665

03/01/2023, 5:23 PM

These are the last 2 lines of the log:

Copy code

I0301 17:03:21.512658     232 log.go:71] Marshaling property for RPC[ResourceMonitor.RegisterResource(aws:autoscaling/notification:Notification,production-ecs-1-asg-notifications)]: topicArn={arn:aws:sns:eu-west-1:476250223542:production-ecs-terminations-b11fa72}
I0301 17:10:00.972234     232 log.go:71] error renewing lease: [403] The provided update token has expired.

jolly-agent-91665

03/01/2023, 5:24 PM

You can see the 7 minute delay since the last AWS action. The command I'm running is:

Copy code

timeout 1800 pulumi up --skip-preview --tracing=file:./up.trace --logtostderr --logflow -v=9 2> ./out.txt

jolly-agent-91665

03/01/2023, 5:24 PM

I added the timeout as otherwise the

pulumi up

command never exits.

billowy-army-68599

03/01/2023, 5:25 PM

could you send a support issue to support@pulumi.com - ignore the automated response

jolly-agent-91665

03/01/2023, 5:40 PM

Will do now!

jolly-agent-91665

03/01/2023, 5:43 PM

Done! #2457

jolly-agent-91665

03/01/2023, 5:44 PM

We're at the point where we're considering an Enterprise plan because of the number of resources we have. This is bad timing as we should have done it before this happened - hindsight!

billowy-army-68599

03/01/2023, 5:46 PM

I see you’re using an individual account, what stage of the process are you in?

jolly-agent-91665

03/01/2023, 5:46 PM

We've not kicked it off yet

billowy-army-68599

03/01/2023, 5:46 PM

good to know, i’ve taken the ticket and will take a look behind the scenes

jolly-agent-91665

03/01/2023, 5:47 PM

Thankyou - at the end of this we should go through a pricing discussion

billowy-army-68599

03/01/2023, 5:49 PM

could you send a separate email to lbriggs[at]pulumi.com for that?

billowy-army-68599

03/01/2023, 5:49 PM

we’ll get this sorted before we have that chat

jolly-agent-91665

03/01/2023, 5:50 PM

Have done & thanks.

billowy-army-68599

03/01/2023, 6:18 PM

@jolly-agent-91665 quick q: is this an ongoing problem or did it start recently?

jolly-agent-91665

03/01/2023, 6:19 PM

This started since we created a new set of stacks for our environments. It is only affecting our production stack though and only on GitLab

jolly-agent-91665

03/01/2023, 6:21 PM

It runs fine on MacOS so perhaps it's a resource thing? But it seems strange when it only affects this stack and the resources are (almost) in parity between staging, sandbox and production.

billowy-army-68599

03/01/2023, 6:22 PM

are the number of resources in staging/production and sandbox the same?

jolly-agent-91665

03/01/2023, 6:23 PM

30 resources less on both of those as they omit an SSH Bastion.

jolly-agent-91665

03/01/2023, 6:23 PM

Otherwise the exact same

billowy-army-68599

03/01/2023, 6:25 PM

can you do a deployment with an SSH bastion, just to eliminate a theory I have

jolly-agent-91665

03/01/2023, 6:26 PM

Onto the staging stack?

billowy-army-68599

03/01/2023, 6:26 PM

or sandbox, whichever is preferable

jolly-agent-91665

03/01/2023, 6:26 PM

Will do now

jolly-agent-91665

03/01/2023, 6:34 PM

I'm running it now - I don't know if this is helpful but I have an active preview running against the production stack (https://app.pulumi.com/will/infrastructure/production/previews/c89c78d5-37e2-4d68-a893-5ceaa7537a35) and that's hanging in the same way for >30 minutes but it's hanging on both the Pulumi Service and the

pulumi preview

command.

billowy-army-68599

03/01/2023, 6:36 PM

i have an engineer investigating

billowy-army-68599

03/01/2023, 6:42 PM

okay, I think this is likely the same as: https://github.com/pulumi/pulumi/issues/7094 can you try setting the environment variables:

Copy code

export PULUMI_EXPERIMENTAL=1
export PULUMI_SKIP_CHECKPOINTS=1
export PULUMI_OPTIMIZED_CHECKPOINT_PATCH=1

jolly-agent-91665

03/01/2023, 6:43 PM

Will try that now

jolly-agent-91665

03/01/2023, 7:28 PM

That didn't fix it (for the preview at least).

jolly-agent-91665

03/01/2023, 7:29 PM

What we have done is double the instance sizes on GitLab that runs the CI job and that fixed it

billowy-army-68599

03/01/2023, 7:30 PM

Okay that’s good information. We are having a lovely discussion behind the scenes on this, if we have any concrete fixes we’ll let you know

jolly-agent-91665

03/01/2023, 7:30 PM

I had a quick scan of the GitHub issue and it does seem like it could be related given that we have a comparable number of resources.

jolly-agent-91665

03/01/2023, 7:31 PM

Thankyou - I'm intrigued to say the least

billowy-army-68599

03/01/2023, 8:20 PM

@jolly-agent-91665 could you run an update locally, and capture a: • performance trace • verbose logging • a profile of cpu/mem which can be captured using

--profiling

you can use the support ticket to send them, DM me if there’s any issues sending this over securely

jolly-agent-91665

03/01/2023, 8:28 PM

Yeah I'll grab these tomorrow (it's 8pm here!)

billowy-army-68599

03/01/2023, 8:28 PM

apreciate it! thanks!

30 Views

Open in Slack

Previous Next