hundreds-battery-6703003/16/2021, 12:49 AM
process and the number of objects in the K8s cluster? I have a situation where
was crawling to halt on a 8GB worker in CircleCI, and when I tried it locally I saw it taking up 20+GB of memory. Any tips on troubleshooting this further?
colossal-australia-6503903/16/2021, 12:59 AM
hundreds-battery-6703003/16/2021, 4:57 AM
process timed out in CI and we had to manually repair the state file.
colossal-australia-6503903/16/2021, 4:43 PM
hundreds-battery-6703003/16/2021, 6:18 PM
The net result was that theThis is still the symptom that I’m trying to understand better. Previously I thought it was the high memory usage that would grind the process to a halt especially in a CI environment. But when I mimicked the CI job locally, I noticed that the memory usage is reasonably small (~2GB) and fairly steady. I still see that theprocess timed out in CI and we had to manually repair the state file.
process gets stuck a few minutes after launching. By “stuck” I mean it does not progress in the terminal logs, and I do not see any further updates in the cluster state via
error: 2 errors occurred: * the Kubernetes API server reported that "<redacted replicaset name>" failed to fully initialize or become live: Resource operation was cancelled for "<redacted replicaset name>" * Attempted to roll forward to new ReplicaSet, but minimum number of Pods did not become live
I could see the ReplicaSet had achieved minimum availability i.e. all `Pod`s were ready.
colossal-australia-6503903/16/2021, 7:41 PM
hundreds-battery-6703003/16/2021, 7:58 PM
after I ctrl-C’d it earlier, and it might be be stuck again (I’ll know in a few more minutes). This time I’m running it with
flag to see if it sheds any light. I could also DM you the outputs if you’d like.
colossal-australia-6503903/16/2021, 8:01 PM
hundreds-battery-6703003/16/2021, 8:03 PM
In a different terminal:
... [1/2] Waiting for app ReplicaSet be marked available [1/2] Waiting for app ReplicaSet be marked available (0/5 Pods available) warning: [Pod proxy-0nrzpfgd-7d5d8c469d-drfd7]: containers with unready status: [proxy] ✨ updating...⠐
kubectl get po proxy-0nrzpfgd-7d5d8c469d-drfd7 NAME READY STATUS RESTARTS AGE proxy-0nrzpfgd-7d5d8c469d-drfd7 1/1 Running 0 12m
so I presume it’s processing one resource at a time
[1/2] Waiting for app ReplicaSet be marked available [1/2] Waiting for app ReplicaSet be marked available (0/5 Pods available) warning: [Pod proxy-0nrzpfgd-7d5d8c469d-drfd7]: containers with unready status: [proxy] error: 2 errors occurred: * the Kubernetes API server reported that "default/proxy-0nrzpfgd" failed to fully initialize or become live: Resource operation was cancelled for "proxy-0nrzpfgd" * Minimum number of Pods to consider the application live was not attained
shows the following:
This is why I am inclined to believe
$ kubectl get deploy proxy-0nrzpfgd -n default NAME READY UP-TO-DATE AVAILABLE AGE proxy-0nrzpfgd 10/10 10 10 249d
gets stuck checking the status but I can’t tell why or where just yet.
colossal-australia-6503903/16/2021, 9:40 PM
hundreds-battery-6703003/16/2021, 10:30 PM
invocation, it successfully updates one
but gets stuck on the next
. Other resource types (`Secret`s, `ConfigMap`s etc) do not exhibit this behavior.
colossal-australia-6503903/16/2021, 11:26 PM
hundreds-battery-6703003/22/2021, 6:51 PM
colossal-australia-6503903/24/2021, 12:45 AM