There are a couple of old stacks in our environmen...
# general
w
There are a couple of old stacks in our environment that I can no longer find the code for, and certainly the majority of the infra they reference (AWS resources) doesn't appear to be in our account. I'd love to just delete these stacks but I don't want to manually go through each 100+ items in each one and double check that I'm not in fact tearing down real resources if I kill all of this. I tried to run a refresh on it, to see if it found any matches in the real infra, but got
getting snapshot: snapshot integrity failure; refusing to use it
. Is there any other way to check that I can safely kill this stack without manually going through each resource?
l
The refresh option is the only one that comes to mind right now. It might be worth spending a little time getting it to work. Can you tell what the snapshot that's causing that error message? If it's a Pulumi resource something that's associated with a non-existent cloud resource (e.g. an EC2 instance or an RDS instance), then you could try deleting it from state and re-refreshing?
If there's only a few resource types, you could also build a jq script from the exported stack that generates
pulumi import
statements and run that in a new dummy project. But that's unlikely to be a small amount of work 😞
m
@witty-battery-42692 have you tried `--skip-pending-creates`` or related flags mentioned here: https://github.com/pulumi/pulumi/pull/10394 They might get you to a working refresh.
w
Ooh, I hadn't seen those flags, but unfortunately they don't actually change the outcome, nor does a start repair. I can start deleting things from state but there's a chance that comes down to the manual item by item approach that way as well - but maybe I'll export it, back it up, and give it a shot with the first couple of blockers; maybe the rest will magically clear after that. Thanks!
m
oh, so if you export your state file, there is no
pending_operations
? I was searching in github and found
snapshot integrity failure; refusing to use it
was related to pending operations ( at least at times ).
pulumi stack export --disable-integrity-checking
might be need to export.
e
snapshot integrity failure
is normally due to missing dependencies or parents in the resources section. If you post the full integrity error here we can help point to what needs repairing.
w
There are a bunch of pending-operations in the state file, yes, but I guess just the flag wasn't enough to get past any other errors as well? There are definitely a lot of dependencies in here on an EKS cluster that was defined here originally and no longer appears to exist. I'd definitely still like to see what all else was created that does still exist (for example, a node security group was created in here that I do still see in AWS), but with the main cluster gone and it being a dependency or parent for so much of this it's gonna be interesting to sort out. Seems like there could be two possible types of stuff in here - one: all the of the bits around/from the cluster, some of which are orphaned and I'd like to clean up, and two: potentially stuff that's not part of this cluster itself per se but happened to be in the same stack, which means I'd need to dig deeper and see if those are still used anywhere. Ugh, what a mess
e
Does
pulumi state repair
fix it? If its just a resource that's gone missing but still listed as a dependency I think the repair command should be able to just strip it
w
It didn't, no - got
Pulumi is unable to automatically repair the snapshot. This can happen if the snapshot contains cycles or corrupted or unparseable data.
Hmmm - it's saying that things are referring to an unknown provider, and referencing a kubernetes provider
e
Is the ID different?
w
Well the cluster is completely gone, so yeah I can see how that provider wouldn't work at all. Stripping out all of those resources to start, I suppose (basically all cluster operations like creating role bindings, storage classes, etc)
Ah, okay, it does successfully run a refresh without those now, anyway - it's a start. Thanks!
👍 1
m
I'm curious what the fix was. Was is it
pulumi state repair
then refresh worked?
w
I had to export the state, manually remove everything trying to use that no longer existent provider (so all of the kubernetes operations basically), and import that state - then I was able to run the repair and refresh
m
Nice!