hundreds-battery-67030
03/16/2021, 12:49 AMpulumi up
process and the number of objects in the K8s cluster? I have a situation where pulumi up
was crawling to halt on a 8GB worker in CircleCI, and when I tried it locally I saw it taking up 20+GB of memory. Any tips on troubleshooting this further?colossal-australia-65039
03/16/2021, 12:59 AMhundreds-battery-67030
03/16/2021, 4:57 AMpulumi up
process timed out in CI and we had to manually repair the state file.colossal-australia-65039
03/16/2021, 4:43 PMhundreds-battery-67030
03/16/2021, 6:18 PMThe net result was that theThis is still the symptom that I’m trying to understand better. Previously I thought it was the high memory usage that would grind the process to a halt especially in a CI environment. But when I mimicked the CI job locally, I noticed that the memory usage is reasonably small (~2GB) and fairly steady. I still see that theprocess timed out in CI and we had to manually repair the state file.pulumi up
pulumi up
process gets stuck a few minutes after launching. By “stuck” I mean it does not progress in the terminal logs, and I do not see any further updates in the cluster state via kubectl
.error: 2 errors occurred:
* the Kubernetes API server reported that "<redacted replicaset name>" failed to fully initialize or become live: Resource operation was cancelled for "<redacted replicaset name>"
* Attempted to roll forward to new ReplicaSet, but minimum number of Pods did not become live
…whereas via kubectl
I could see the ReplicaSet had achieved minimum availability i.e. all `Pod`s were ready.colossal-australia-65039
03/16/2021, 7:41 PMhundreds-battery-67030
03/16/2021, 7:58 PMpulumi up
after I ctrl-C’d it earlier, and it might be be stuck again (I’ll know in a few more minutes). This time I’m running it with --profiling
flag to see if it sheds any light. I could also DM you the outputs if you’d like.colossal-australia-65039
03/16/2021, 8:01 PMhundreds-battery-67030
03/16/2021, 8:03 PMpulumi up
output:
...
[1/2] Waiting for app ReplicaSet be marked available
[1/2] Waiting for app ReplicaSet be marked available (0/5 Pods available)
warning: [Pod proxy-0nrzpfgd-7d5d8c469d-drfd7]: containers with unready status: [proxy]
✨ updating...⠐
In a different terminal:
kubectl get po proxy-0nrzpfgd-7d5d8c469d-drfd7
NAME READY STATUS RESTARTS AGE
proxy-0nrzpfgd-7d5d8c469d-drfd7 1/1 Running 0 12m
I’m running pulumi up
with -p 1
so I presume it’s processing one resource at a time[1/2] Waiting for app ReplicaSet be marked available
[1/2] Waiting for app ReplicaSet be marked available (0/5 Pods available)
warning: [Pod proxy-0nrzpfgd-7d5d8c469d-drfd7]: containers with unready status: [proxy]
error: 2 errors occurred:
* the Kubernetes API server reported that "default/proxy-0nrzpfgd" failed to fully initialize or become live: Resource operation was cancelled for "proxy-0nrzpfgd"
* Minimum number of Pods to consider the application live was not attained
…whereas kubectl
shows the following:
$ kubectl get deploy proxy-0nrzpfgd -n default
NAME READY UP-TO-DATE AVAILABLE AGE
proxy-0nrzpfgd 10/10 10 10 249d
This is why I am inclined to believe pulumi
gets stuck checking the status but I can’t tell why or where just yet.colossal-australia-65039
03/16/2021, 9:40 PMpulumi refresh
?hundreds-battery-67030
03/16/2021, 10:30 PMpulumi up
invocation, it successfully updates one Deployment
but gets stuck on the next Deployment
. Other resource types (`Secret`s, `ConfigMap`s etc) do not exhibit this behavior.colossal-australia-65039
03/16/2021, 11:26 PMhundreds-battery-67030
03/22/2021, 6:51 PMpending_operations
).colossal-australia-65039
03/24/2021, 12:45 AM