Since yesterday (31 March 2020), we have been seei...
# general
s
Since yesterday (31 March 2020), we have been seeing persistent 429 rate limit errors with the GCS state backend for pulumi. See this error irrespective of the size of pulumi stacks (even on newly created stacks) and even when only a single object from the stack is being created or modified. Also tried
pulumi up --parallel 1
for single threaded execution, but still see this error.
Copy code
Diagnostics:
  gcp:kms:CryptoKeyIAMBinding (xxx-permissions):
    error: post-step event returned an error: failed to save snapshot: An IO error occurred during the current operation: blob (key ".pulumi/stacks/<stack-name>.json") (code=Unknown): googleapi: Error 429: The rate of change requests to the object <gcs-bucket-name>/.pulumi/stacks/<stack-name>.json exceeds the rate limit. Please reduce the rate of create, update, and delete requests., rateLimitExceeded

  pulumi:pulumi:Stack (<pulumi-project-name>-<stack-name>):
    error: update failed
We may have to give up on using GCS buckets entirely for storing pulumi state. Does anybody know about what could be causing this issue or any workarounds? Thanks. Have created github issue for this too: https://github.com/pulumi/pulumi/issues/4258
👍 1
a
i am also using a GCS bucket for storing pulumi state. i haven’t see this error.
my state is not tiny but also not large.
s
Maybe the issue is region dependent? Our bucket is in
asia-east1 (Taiwan)
Could you let us know which region your bucket is in?
a
eu
multi region
s
Thanks, I'll take a look at replicating that
q
I posted this too
s
Ah, my bad. Didn't spot that before. I'm going to try a multi-region bucket in Asia and report back if it helped. 🤞
q
I'll check my bucket config now too
I think mine is single region, europe-west2
a
mmh which version of pulumi are you guys using ?
s
Copy code
pulumi version
v1.13.1
a
i am still on
v1.13.0
may be that is a difference ?
s
Hmm, could be.
I'm now trying a multi-region bucket. It worked once, but will only know for sure if this fixes the issue after running for a while.
👍 1
Also trying
pulumi up --parallel 1
right now, not sure if that would help
q
I've tried 1.12, 1.13, and 1.13.1
😭 1
Parallel flag doesn't help
@steep-caravan-65104 are you using CircleCI by chance?
s
Nope, Google Cloud Build
q
Ah. I get the error when using CircleCI. Doesn't replicate locally
s
Was seeing it locally too
q
Damn
s
Still getting 429 errors with the multi-region GCS bucket too
Forgot to mention one thing before. Sometimes, the state file gets deleted in this process too when 429 errors are received, which is weird. Have to restore from the backup state files on the bucket in this case.
q
Sorry to tag you, @white-balloon-205; but can you let us know if anything has changed with regards to the state file update process?
w
Nothing intentionally changed here - but it’s possible we picked up a new version of the gcloud sdk? I see https://github.com/pulumi/pulumi/issues/4258 is open now - we’ll look into that today.
👍 4
q
Thank you
c
Same here on
pulumi v1.14.0
and
Google Cloud SDK 287.0.0
w
cc @billowy-army-68599 who has been looking into this.
s
I've been running pulumi v1.13.0 for a day (have run 5-6 builds so far) and haven't seen this issue again so far. So seems like this issue was indeed introduced with v1.13.1 after all?
👍 1
b
Would you be able to send me verbose logs from both v1.13.0 and v1.13.1 ? I’m unable to repro it @steep-caravan-65104
@quiet-wolf-18467 I saw elsewhere you could repro this on v1.13.0 - is that still the case?
s
@billowy-army-68599 I've sent you part of the verbose logs from v1.13.1 and v1.13.0 pulumi up runs, hope it helps.
q
I've been in vacation the past few days, I'll get our CI logs across on Monday
g
I just hit this as well. GCS state, same rate limit error. My stack uses lots of helm charts and i’ve been doing tons of back-to-back up/destroys today. After waiting a little bit I could use pulumi again without the rate limit error.
It seems to happen primarily when using helm charts with tons of k8s resources in them. i’m working around the issue now by using local filestate:
gsutil cp -r <gs://bucket/.pulumi/stacks> ./.pulumi
and then
pulumi login file://.
and then using gsutil to re-upload after i’m done.
s
@great-byte-67992 as mentioned above, you could try Pulumi v1.13.0 where I haven't seen this issue yet. I had passed on logs to @billowy-army-68599 so hopefully will have some resolution soon.
g
yeah, i saw that on the github issue. I’ll use 1.13 😄
Thought i’d check in again. using
@pulumi/pulumi 1.13.0
didn’t resolve the issue. Did you mean to use pulumi CLI 1.13.0?
b
I've yet to confirm that 1.13.0 is not affected, some people say it helps but there's no reason I can see in the code that would confirm that
are you seeing the issue on
1.13.0
as well?
s
I've been using 1.13.0 with both docker image and cli for several days now, and haven't seen the issue again so far in either case in hundreds of builds. With 1.13.1, was seeing the issue very frequently. Perhaps the issue is hit more frequently with 1.13.0 in other people's use cases
g
I was trying 1.14.0 of the cli with 1.13.0 of the npm package. I’ll try with 1.13.0 for both next week and report back. My bucket is in australia-southeast so that might be why as well.