Ok, what the h**l. Suddenly `pulumi up` fails for ...
# general
g
Ok, what the h**l. Suddenly
pulumi up
fails for me because Pulumi insists it has to replace all Kubernetes namespaces but when it tries to do that it fails since it apparently tries to create a new namespace with equal name first (since I have forced explicit names for a good reason) and that naturally won't work. Digged into the diff and it shows this
Copy code
[provider: urn:pulumi:dev::my-project::pulumi:providers:kubernetes::k8s-provider::efaab349-6f4f-42ff-a607-ee6ab55cbb61 => urn:pulumi:dev::my-project::pulumi:providers:kubernetes::default_3_1_2::7b6466de-7864-49f9-8c33-0ece6217519a]
Which makes absolutely no sense to me. That is the only thing it is trying to change. This is 100% Pulumi internal stuff, apparently for some reason the provider has changed?
Fortunately I had enforced strict name to the namespace and had not allowed pulumi to generate the suffix. That would have ended up recreating everything in my k8s cluster since all the namespaces would have been recreated from scratch
Any tips how to fix this?
pulumi refresh
doesn't seem to help
b
you didn't change any code?
g
Not regarding those namespaces
Added some stuff to completely different namespaces
b
what did you add/change?
it looks like the version maybe changed? did you updated any dependencies?
g
a ConfigMap which is referring to a
random.RandomUuid
in a typescript namespace where it is being referred in one of the namespaces related
I did not touch any dependencies
Git agrees with me, packages.json has not changed
This is kinda bad...
It seems like it has changed to completely different provider
k8s-provider
to
default_3_1_2
whatever that is
b
can you share some code and the diff?
g
Hmm. Wait a minute... That does make sense. I think I must have forgotten to provide opts
{ providers: myK8sProvider }
somewhere
Maybe it ripples through somehow
since my provider is named
k8sProvider
which would match, although in kebab-case
Let me check this first
And there's a lot of code and unfortunately cannot share at least not without obfuscation 😞
It is actually updating provider into pretty much all the resources
I wonder wtf might have happened
Any idea what could cause it to think the provider has changed? Except the obvious one with dependencies
b
it'll be hard to debug without seeing some code, but if you're using an implicit provider you might have changed context, or modified the provider name etc
you can fix it by editing the state and putting it back though, if you didn't change any code, it's a little painful though
g
a little 😅
any way to recreate the state from scratch?
it looks like the state has gone bonkers somehow
Checked, no changes to k8sProvider
in code
Which has now somehow changed in state
Or more like it is correct in remote state but locally it insists on changing it which doesn't work
b
you can't recreate from scratch, you'd have to destroy 😞
g
No, wait. It's actually bonkers in remote state. WTF
Shiiit
Good thing this isn't production
So if I have to destroy it's not like a critical issue but just a pain in the butt
No, I was wrong. It is intact in remote state phew
The state json is just so huge I thought I was looking at correct place
Correction: it's fucked. But .bak is correct. Time for backups!
OK. Now remote state should be fine but it's now applying the changes locally to providers. Need to debug this further... Fun little problem. Will share if I find out what happened
OK, found the code that causes it. It just doesn't make ANY sense
Copy code
const grafanaIni = oauth.grafanaClientSecret.result.apply(secret => {
    return `
[azure]
managed_identity_enabled = true

[security]
disable_gravatar = true

[auth]
oauth_auto_login = true
disable_login_form = true
signout_redirect_url = https://${GRAFANA_DOMAIN}

[auth.generic_oauth]
name = Keycloak
enabled = true
allow_sign_up = false
client_id = grafana
client_secret = ${secret}
scopes = user:email
empty_scopes = false
email_attribute_name = email:primary
`
});
If I replace that with
const grafanaIni = "";
all the provider changes go away
oauth.grafanaClientSecret
is
new random.RandomUuid("grafana-client-secret");
b
definitely seems like a bug, could you open an issue?
g
I could but I think I would have to find out the smallest reproducible example
I do not understand wtf is even going on there right now
I'll try to find the minimum repro case tomorrow
And for now somehow hack around this
(like hardcode the client secret)
@billowy-army-68599 I think I found the reason. I would say it was a bug in my code but the way Pulumi behaved in this scenario could be better. Although I have to say I do not have quick suggestions how to improve this. So, the root cause of my issue was a circular import chain. Namespace A was dependent of namespace B which was dependent on namespace A. Usually not a good situation to be in and in this case it triggered a nasty bug. The thing with this case was that when this happened then some code was executed when the
k8s.Provider
that the code was referring to was still
null
. This lead the opts object to be effectively
{ provider: null }
which Pulumi considers to be "you have not set this, I'll just use the
~/.kube/config
value". Which in this case pointed to that exact same Kubernetes cluster since I was actively operating with it. This caused Pulumi to create the default Provider which had a different name, thus the
default_3_2_1
we saw in the state. The one thing that I can think of that would save from this issue would be a configuration parameter somewhere which would enforce the user to always to set the provider and if he fails to do this then it would be an error. The more I think of this then the less it makes sense that it is even legal to not provide the provider since that makes the code non-idempotent. End result can be very different based on external configurations not controlled by Pulumi in any way. To me it would make sense Pulumi would always require an explicit provider and one would have to set that up first. I know this change would be backwards incompatible but it just might be worth it via deprecation warning. You may disagree and that's fine, but I think this kind of situation where one can corrupt the remote state just by placing one innocent import into wrong place is not exactly great.
b
Thank you for the detailed write up, and I agree we can improve here. Could you file an issue?
g
Sure, into Github?
Guess this would be the correct repo
1,1k issues... Guys. You need to work on these 😄
I wonder if this is a bug report or a feature request 🤔
Maybe latter since technically this is working correctly
And it's not directly Pulumi's fault
b
g
Right! Of course 🤦‍♂️