https://pulumi.com logo
Title
a

alert-zebra-27114

05/16/2023, 5:55 PM
We occasionally gets the infamous error "another operation (install/upgrade/rollback) is in progress" in our deployments, and I would like to hear if there are anyway to figure who is actually holding the "lease" (as described in https://www.pulumi.com/docs/support/troubleshooting/#conflict) A little background: We do our deployment of applications inside our AWS EKS cluster using Pulumi. This works fine for us, as it allows us to setup any secondary AWS infrastructure for an application along with the Helm deployment. This is things like queues, Cognito, IAM roles, API Gateway and similar. Lately, we have begun to collect the applications that belong together in a "release" and we then deploy that as a whole, by deploying each of the applications in the release one-by-one. But... we get a lot of conflict errors. When we look, we cannot find any other running Pulumi script that might be cause of the error, so we have begun to suspect that the "lease" is not properly released by one application deployment before the next application deployment starts. The deployments are in different stacks, but in the same project. So... is it possible to find the "other" conflicting Pulumi script in this case? If yes, then we can find the problem, or alternatively wait until the "lease" is properly released before proceeding.
p

polite-scientist-28869

05/17/2023, 5:44 AM
[Full disclaimer - I'm not an official rep for Pulumi] I had a similar issue recently and resolved it by limiting the concurrency of the deployment through github actions (so the images are built in parallel and deployments are left for last + done synchronously). https://github.com/pulumi/pulumi/issues/2073 If you're already doing things synchronously and are sure that nothing else could be running, I guess you could use
pulumi cancel
- but this feels like a rather ugly hack. I've also seen some cases where the pipeline was cancelled during the
pulumi up
, which results in a "broken" state for a period of time. The only solution I've found so far is
pulumi cancel
for this too. The solutions I listed above are definitely not ideal, so I'm also interested in any (other) potential solution to this.[
a

alert-zebra-27114

05/17/2023, 6:18 AM
Thanks. We are already using GitHub concurrency control to limit the number of jobs that can deploy to an environment at any one time. I'll try out the cancel logic, just to see if it will make any difference. Though it does seem like the wrong axe to use here.
p

polite-scientist-28869

05/17/2023, 6:47 AM
Are you storing the state on your side (in a bucket or something similar) or are you relying on the pulumi service? I'm asking because if it's the former, maybe there's a potential issue with reading/writing to the state "at the wrong time".
a

alert-zebra-27114

05/17/2023, 7:01 AM
We currently use the Pulumi service for this...