This is all an <XY Problem> though, the root probl...
# general
o
This is all an XY Problem though, the root problem is: I have a helm chart (ugh) that sometimes hangs (double ugh!) when I try to Pulumi deploy it, and when it hangs it doesn't serialize the state and checkpoints back. This then puts me in a really bad state because I have a bunch of dangling resources that I can't import correctly. How do I handle this?
r
I think the Recovering from an Interrupted Update + Manually Editing Your Deployment sections might be helpful here: https://www.pulumi.com/docs/troubleshooting/#interrupted-update-recovery
o
I think that ends up being a huge amount of work, when what I really want is behavior like: "Oh, this resource is already deployed with the right namespace and name? Okay, I'll import it automatically!"
I can't stress this enough: when the helm chart hangs in our CI system, the state is completely absent
I cannot edit my deployment because the CI system kills the process and no checkpointed state is persisted anywhere
(we have a 1 hour timeout on our CI jobs)
r
I hear ya. I think folks have hit this same issue but perhaps want a different outcome behavior? https://pulumi-community.slack.com/archives/C84L4E3N1/p1587207659355600 https://github.com/pulumi/pulumi/issues/4265 Feel free to upvote if that's in line with what you'd like to see, or open up an issue if it's not.
w
@orange-policeman-59119 just to check - are you using the Pulumi service backend of a cloud storage (blob) backend?
it hangs and doesn’t serialized the checkpoint
This should never happen. I’d love to understand exactly what is causing this for you.
o
Cloud storage (blob) backend
I switched to basically doing rsync $bucket $localdir pulumi login --local $localdir # do operation pulumi logout rsync $localdir $bucket
I was deploying the
prometheus-operator
Helm chart to a cluster and a CRD was already defined, throwing errors which would then hang
I also think there was some Google Cloud Storage rate limiting happening which caused a hang - when I tried running directly against the GCS bucket backend I observed a lot of 429 rateLimitExceeded errors in my console
w
Yeah - there have been many reports on this recently. Appears something changed in the GCS side recently. Discussing how to address that for Pulumi at https://github.com/pulumi/pulumi/issues/4258. I would guess that is the root of the problems here