This message was deleted.
# general
s
This message was deleted.
c
This probably isn't the answer you want. I have a dev/prod setup. I actually introduced a lab environment just for this problem. If I break lab, not a big deal, destroy it, rollback code, bring it up to match dev. Sometimes I edit the state file directly if it's an easy enough fix. Sometimes setting delete_before_replace is necessary, but doesn't always work. You could also introduce a variable, which could be changed easily to cause certain portions of your infrastructure to be re-created. Sometimes I get libraries for working with infrastructure directly. For example, getting the python-kubernetes package to fix things in the same pulumi run. All hacks.
g
In general, we do not expect this to be a common experience. If this is happening often, please open an issue at https://github.com/pulumi/pulumi/issues so that we can look into this. When this does happen it's typically because of an interruption during a
pulumi up
(network disruption,
CTRL+C
, or otherwise) that leaves resources in an unknown or not fully known state. In this scenario, you can
export -> correct problem -> import
to recover. Depending on the status of the resources that failed you can
pulumi stack export | pulumi stack export
to not have to edit the state manually. As Dillon said, there are other options to let you refactor your resources and state with
aliases
and other resource options such as
delete_before_replace
, etc.
b
to add some personal experience to this - i generally advise users not to create stacks with too many resources - this is a common thing across most IaC tools. It's generally good practice to keep things small to avoid "blast radius" issues. That said, we don't expect this to be a common thing, but the more resources you add to your stack, the more mathematically probably it becomes you're going to run into a bug.
s
for me it has been quite common honestly, I never interrupt a running deployment, this I have learnt leads to disaster. However, many times I do not even know the root cause, it can be that some pods do not manage to get the ready state and then a chain reaction of errors ends the stack in broken state. One thing I discovered yesterday is that if you are not careful with dependencies then you can easily break the stack, for example, I had some resources that depended on a cluster, but dependencies where wrong (can elaborate more on this later), what happened was that all resources tried to be deployed in parallel, since the cluster is not ready errors start to accumulate during the deploy and almost always I ended with a broken stack…
b
I can't comment on that particular bug, but as I said before, I'd highly recommend not putting all your resources in one big stack.
s
do you have some documentation that explains how too split a stack in smaller ones? basically my stack is one kubernetes cluster with several services running on it.
b
we don't, and it's on my backlog to write it
but this repo shows how it can be done: https://github.com/jaxxstorm/iac-in-go
s
thanks!