TLDR; DigitalOcean rotates the kubeconfig, and eve...
# kubernetes
TLDR; DigitalOcean rotates the kubeconfig, and everytime it does that, pulumi is unable to connect to the cluster due to server asking for credentials. I tried doing a refresh with
into the
, which brought a new kubeconfig content, but for some reason it doesn't seem to be used by the provider at all. Still getting
configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials
You should be able to revert to any previous checkpoint state with the following:
pulumi stack export --version=<previous-version-number> > out
followed by
pulumi stack import --file=out
I did revert, but I'm unsure how to update the kubeconfig now
I did a
pulumi refresh --target urn:cluster
and it got the new kubeconfig, but as soon as I tried doing a
pulumi refresh
it warns that the cluster is unreachable, and if by mistake I hit confirm, it deletes all the resources.
I imagine this is the part that is supposed to keep the
updated when using DigitalOcean clusters: But for some reason this is not working at all. It doesn't seem to catch the updated kubeconfig, and putting a
in the apply callback doesn't seem to fire to. I'm assuming this is never getting called after the first provisioning? Really confused right now and unable to properly debug it.
Hmm…I haven’t used DO k8s much, but I you should be able to temporarily set the kubeconfig as a stack config to get yourself unstuck. (
pulumi config set kubeconfig ...
I thought I recall somebody saying that you can get longer-term kubeconfigs for automation scenarios to help prevent this in the future, but I’m not 100% sure on that
I'm spawning a new cluster with k3d locally to see if I can reproduce the problem
I don't know how pulumi handles resource relationship input/output resolution (pull or push)
so I don't really know if it's DO resource or K8s one's failing to update the information in the state
I think I saw the workaround you mentioned, but I'm unable to find and test it at this point.
yeah, just found it too. I'll test if this approach works
It's the second time this happens, the last time I had to delete the whole cluster because I couldn't recover the state it was before
I think I`m better off splitting the infrastructure from the application. This is causing a lot of problems and I won't be able to publish this to production since this is managing enterprise client clusters 😐
I'd love to track this down and make a reproducible demo, but I'm unsure how to properly debug what is going on (doesn't log, doesn't trace, process.exit seems to be ignored, throw doesn't do a thing, enabling --debug generates a 25mb output)
only thing I can imagine is that refresh differs from up on the code execution somehow? but I have no idea how it's supposed to refresh stuff without executing the code and
at least for the it's dependencies
Sorry you’re having a bad experience here. I would recommend trying to keep cluster infra in a separate stack from k8s applications, as it helps limit the “blast radius” when something changes. We do have logic in the provider to handle unreachable clusters, but it seems like that’s not working right in your case. Refresh/diff is a separate code path from update, but I’m not sure if that’s causing the behavior you’re reporting.
It's fine, I'm pro in finding edge cases hahaha, I'll keep digging later until I get this working
Hopefully I can find what is causing it