Hello everyone. We run pulumi via the pulumi-kuber...
# general
b
Hello everyone. We run pulumi via the pulumi-kubernetes-operator. After some time the operator stops updating stacks. We were able to track the issue down to zombie processes being left behind by pulumi (and the OS running into the max pid). From what we can see the issue looks something like this: The pulumi operator “up”s a stack. Calling the pulumi binary which runs binaries for the different providers.
ps axu --forrest
looks like this
Copy code
azureus+ 11707 11.1  0.2 1082492 80860 ?       Ssl  21:03   2:24  \_ /usr/local/bin/pulumi-kubernetes-operator --zap-level=error --zap-time-encoding=iso8601
azureus+ 32037  9.3  0.1 753864 63388 ?        Sl   21:24   0:00      \_ pulumi up --yes --skip-preview --exec-kind=auto.local --exec-agent=pulumi-kubernetes-operator/v1.6.0-a8c9e89 --stack REDACTED --non-interactive
azureus+ 32063  0.5  0.0 717528 20020 ?        Sl   21:24   0:00      |   \_ /usr/bin/pulumi-language-go -root=/tmp/pulumi_auto2129071416/hack/pulumi/REDACTED 127.0.0.1:41789
azureus+ 32175  0.0  0.0 1527900 22928 ?       Sl   21:24   0:00      |   |   \_ /usr/local/go/bin/go run /tmp/pulumi_auto2129071416/hack/pulumi/azure/postgresdb
azureus+ 32088  124  1.2 1099896 399404 ?      Sl   21:24   0:02      |   \_ /home/pulumi-kubernetes-operator/.pulumi/plugins/resource-azure-native-v1.64.1/pulumi-resource-azure-native 127.0.0.1:41789
azureus+ 32153  0.0  0.0 726880 29356 ?        Sl   21:24   0:00      |   \_ /home/pulumi-kubernetes-operator/.pulumi/plugins/resource-random-v4.8.0/pulumi-resource-random 127.0.0.1:41789
azureus+ 32163  0.0  0.1 759624 59440 ?        Sl   21:24   0:00      |   \_ /home/pulumi-kubernetes-operator/.pulumi/plugins/resource-kubernetes-v3.19.2/pulumi-resource-kubernetes 127.0.0.1:41789
azureus+ 32049  3.5  0.0      0     0 ?        Z    21:24   0:00      \_ [pulumi-resource] <defunct>
azureus+ 32143 38.0  0.2 753928 70156 ?        Sl   21:24   0:00      \_ pulumi stack history --json --show-secrets --page-size 1 --page 1 --stack REDACTED --non-interactive
After the completion of the stack “up” (which is successful) we see:
Copy code
azureus+ 11707 11.1  0.1 1082492 63332 ?       Ssl  21:03   2:24  \_ /usr/local/bin/pulumi-kubernetes-operator --zap-level=error --zap-time-encoding=iso8601
azureus+ 32063  0.1  0.0      0     0 ?        Z    21:24   0:00      \_ [pulumi-language] <defunct>
azureus+ 32088 28.2  0.0      0     0 ?        Z    21:24   0:02      \_ [pulumi-resource] <defunct>
azureus+ 32153  0.7  0.0      0     0 ?        Z    21:24   0:00      \_ [pulumi-resource] <defunct>
azureus+ 32163  3.0  0.0      0     0 ?        Z    21:24   0:00      \_ [pulumi-resource] <defunct>
Before long we see many thousands of zombie processes. Has anyone seen anything similar?
b
would be great to have an issue to track this