Is there any way to debug Pulumi when plan phase g...
# general
g
Is there any way to debug Pulumi when plan phase gets stuck forever? I've tried using tracing and verbose logging but nothing really implies anything sensible
Commenting out parts of my code does resolve this so it is related but I haven't been able to find the culprit so far so I'm looking for debugging tools or anything
Oh, and pulumi manages to list create/update/deletes in the CLI just fine, those get populated. But it still gets stuck and plan phase never finishes
CPU usage of node (using typescript) is 0% so it's just waiting for something
Copy code
export const certManager = new k8s.helm.v3.Chart("cert-manager", {
    chart: "cert-manager",
    version: "v1.3.1",
    fetchOpts: {
        repo: "<https://charts.jetstack.io>"
    },
    namespace: nginxIngressNamespace.metadata.name,
    values: {
        installCRDs: true,
        nodeSelector: rootDefs.systemNodeLabels,
        webhook: {
            nodeSelector: rootDefs.systemNodeLabels
        },
        cainjector: {
            nodeSelector: rootDefs.systemNodeLabels
        },
        securityContext: {
            fsGroup: 1001,
            runAsUser: 1001
        }
    }
}, {provider: cluster.k8sProvider});
This causes pulumi to hang. If I comment that out it doesn't hang
I have no idea why though
Other helm charts work fine and I do have that specific repo added locally
And the weirdest thing is, this used to work
OK. I do not know how to sort this out except by recreating that helm chart as pulumi configurations
Which is quite a lot of work
I guess this is some kind of bug in Pulumi helm implementation, might have something to do with hooks
Ugh. Took a deeper look at cert manager helm chart. That is going to be pain to implement in pulumi and all the recommended installation methods include that helm chart or a static installation yaml which is 1,2 megs in size...
OK. Tried this:
Copy code
export const certManager = new k8s.yaml.ConfigFile("cert-manager", {
    file: "cert-manager/cert-manager.yaml",
    transformations: [
        // force all deployments to system nodes
        (obj: any, opts: pulumi.CustomResourceOptions) => {
            if (obj.kind === 'Deployment') {
                obj.spec.nodeSelector = rootDefs.systemNodeLabels
            }
        },

        // Set security context for all deployments to non root
        (obj: any, opts: pulumi.CustomResourceOptions) => {
            if (obj.kind === "Deployment") {
                obj.spec.securityContext = {
                    fsGroup: 1001,
                    runAsUser: 1001
                }
            }
        },
    ]
}, {provider: cluster.k8sProvider});
End result is the same, Pulumi just hangs
I've waited for almost 10 minutes for the plan to finish now
I'm leaning towards a Pulumi bug that has been introduced at some point during the past month or so
Since we executed this very same configuration without issues about a month ago
Very minor changes made after that, none affecting cert manager
Providers etc I've already downgraded to the versions that they were back then without luck. So only thing left is pulumi itself
So much for that theory. Downgrade all the way back to 3.1.0 but still the same issue
I wonder if all of this has something to do with CRDS...
Found the reason! And it's definitely a bug of sorts. I think this is what is happening here: 1. Create a managed AKS cluster 2. Try to execute a helm chart to that cluster 3. Do both of these at the same time with helm chart application referring to kubernetes provider created from that cluster When doing it like this the plan never finishes. However, if one then comments out the helm chart code and executes only the cluster initialization that passes. BUT, if one then uncomments chart code AND does
pulumi up
it again freezes. The way to get it past this is to do a
az aks get-credentials...
to get the valid credentials to that cluster to your local
.kube/config
as the active context. If after this one runs
pulumi up
it doesn't freeze and passes. To me this sounds like helm chart support in Pulumi does not respect the provider configuration. Which to me sounds like a very bad bug waiting to destroy things in a horrible way. The reason why it hang was because my
.kube/config
was pointing to a non-existing cluster (one I had just wiped out).
b
@gorgeous-country-43026 could you open an issue for this, it does sound like a bug to me
Thanks for the hard work debugging it
g
Is it ok, if I just report my findings and not try to recreate a minimal reproduction case?
I really don't have time to do that since debugging that put me back in schedule and I'm in a bit of a hurry right now
(finding that thing took me almost 3 days)
Writing an issue description is something I can do though
b
that would be a good start, thanks
1
g
Put it there since I really didn't know what would be the correct repo
175 Views