hi folks, i’m trying to stand up an EKS cluster pe...
# general
g
hi folks, i’m trying to stand up an EKS cluster per some of the ‘getting started’ documentation for crosswalk. it seems some of the out-of-the-box k8s services are not starting due to no nodes being ready. my nodes are reporting ‘not ready’ due to the network not being initialized:
Copy code
$ kubectl describe nodes | grep 'KubeletNotReady' | head -n1
   Ready            False   Fri, 21 Jun 2019 13:57:30 -0700   Fri, 21 Jun 2019 13:50:29 -0700   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
to my eye, it seems to be same/similar issue as: https://github.com/pulumi/pulumi-kubernetes/issues/578 https://github.com/pulumi/pulumi-eks/issues/174 is there a known workaround?
the relevant bits of my index.ts are as follows:
Copy code
const zone = pulumi.output(aws.route53.getZone({name: config.require("apex-domain")}));

const vpc = new awsx.ec2.Vpc("vpc", {
    cidrBlock: "10.1.0.0/16",
    numberOfNatGateways: 2,
    numberOfAvailabilityZones: "all",
    tags: {"Name": "vpc"}
});

const allSubnetIds = vpc.privateSubnetIds.concat(vpc.publicSubnetIds);

// Create an EKS cluster with the default configuration.
const cluster = new eks.Cluster("k8s", {
    vpcId: vpc.id,
    subnetIds: allSubnetIds,
    nodeAssociatePublicIpAddress: false,
});
pulumi library versions:
Copy code
@pulumi/aws: 0.18.13
@pulumi/awsx: 0.18.6
@pulumi/docker: 0.17.0
@pulumi/eks: 0.18.8
@pulumi/kubernetes: 0.24.0
@pulumi/pulumi: 0.17.18
@pulumi/query: 0.3.0
l
@creamy-potato-29402 @breezy-hamburger-69619 can you help out here? Thanks!
g
thanks @lemon-spoon-91807!
i’m running an
npm update
now after the
awsx
update announcement and going to try another
pulumi up
to see if i get different behavior.
seems same so far.
Copy code
1 Pods failed to schedule because: [Unschedulable] 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
and the
kubectl describe nodes
output is similar to above.
b
The nodes not available and the Node not being able to connect due to CNI issues are separate. The former means nodes are tainted that Pods dont have tolerances for. The latter is usually due version mismatches between control plane & workers, or, roles were not setup right for the nodegroup (assuming there is one).
Which error are you currently experiencing?
g
both 🙂 my understanding was that the nodes are not ready because of the CNI issue, so can’t be scheduled. but i’m happy to be corrected
the
pulumi up
is about to fail after sufficient retries, i’ll grab the output ASAP.
i found the CNI message when trying to dig in on a possible cause why the nodes weren’t available.
b
Hmm, I’m not tracking. By default, nodes are not tainted unless you supply taints to the NodeGroup specifically
g
i haven’t done so. this is from a fresh stack
b
Is there a full repro start to finish you can share?
g
sure, let me open up my github repo — it’s quite simple, slightly modified from the ‘getting started’ guides
here’s the
Diagnostic
section from
pulumi up
that just failed:
Copy code
Diagnostics:
  kubernetes:apps:Deployment (kube-system/kubernetes-dashboard):
    error: Plan apply failed: 3 errors occurred:
    	* Timeout occurred for 'kubernetes-dashboard'
    	* Minimum number of live Pods was not attained
    	* 1 Pods failed to schedule because: [Unschedulable] 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.

  kubernetes:extensions:Deployment (kube-system/monitoring-influxdb):
    error: Plan apply failed: 3 errors occurred:
    	* Timeout occurred for 'monitoring-influxdb'
    	* Minimum number of Pods to consider the application live was not attained
    	* 1 Pods failed to schedule because: [Unschedulable] 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.

  kubernetes:core:Service (kube-system/heapster):
    error: Plan apply failed: 2 errors occurred:
    	* Timeout occurred for 'heapster'
    	* Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods

  kubernetes:core:Service (kube-system/monitoring-influxdb):
    error: Plan apply failed: 2 errors occurred:
    	* Timeout occurred for 'monitoring-influxdb'
    	* Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods

  kubernetes:core:Service (kube-system/kubernetes-dashboard):
    error: Plan apply failed: 2 errors occurred:
    	* Timeout occurred for 'kubernetes-dashboard'
    	* Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods

  kubernetes:extensions:Deployment (kube-system/heapster):
    error: Plan apply failed: 3 errors occurred:
    	* Timeout occurred for 'heapster'
    	* Minimum number of Pods to consider the application live was not attained
    	* 1 Pods failed to schedule because: [Unschedulable] 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.

  pulumi:pulumi:Stack (megadoomer-io-infra):
    error: update failed
let me know if there’s more output that would be helpful from the most recent run. figured this would be the most helpful.
b
Thanks for that. For context, how is this repro connected to the linked issues (IAM role not supplied, or kube-dashboard not working) - or is this separate? I don’t see a reference to IAM or kube-dash in the repro
g
it’s possible that it’s not, and if that’s the case i apologize. my interpretation was that it was a fresh stack where everything comes up except for these deployment/service resources and they still cannot come up after repeated
up
attempts
b
correction: kube-dash is expected since its being deployed by default - duh, my mistake
what does
kubectl get nodes
return?
g
Copy code
$ kubectl get nodes
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-1-129-147.us-west-2.compute.internal   NotReady   <none>   3d    v1.13.7-eks-c57ff8
ip-10-1-192-106.us-west-2.compute.internal   NotReady   <none>   3d    v1.13.7-eks-c57ff8
b
now,
kubectl version
g
Copy code
$ kubectl version
Client Version: <http://version.Info|version.Info>{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"12+", GitVersion:"v1.12.6-eks-d69f1b", GitCommit:"d69f1bf3669bf00b7f4a758e978e0e7a1e3a68f7", GitTreeState:"clean", BuildDate:"2019-02-28T20:26:10Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
b
there’s our problem
you’re on a 1.12 control plane, with nodes on 1.13
you cant have kubelets ahead of the control plane, thats what causes the CNI issue
and in turn bubbles up to no nodes available
g
hm i see — i’m not sure how things got this way. is there something i can/should do within my pulumi stack?
FWIW the first time that i experienced this, i did a
pulumi destroy
and then another
pulumi up
to see if it was some kind of transient issue
b
but this doesn’t make sense given your repro how you would up here, and given that we just patched this when 1.13 came out last week to match the workers against the server - see https://github.com/pulumi/pulumi-eks/pull/175
did you always have pulumi/eks v0.18.8 (which includes this PR ^)
?
g
i wish i could give a sure answer. i’ve been doing a lot of playing with this, `destroy`ing and `up`ing a few times. i’m happy to
destroy
it again, sync + double check version numbers, then
up
again if that’s an expected path forward
b
All good! What I meant was, were you always using v0.18.8 of the
pulumi/eks
package per
npm install
or did you update this from an older version?
g
i see — yes, it’s quite possible as i did do an
npm update
at one point (and all my pulumi packages are set to
latest
) and got new versions. i didn’t pay much attention to what the exact version changes were
b
I wonder if you hit this on an older version of
pulumi/eks
than v0.18.8. i’ll try re-creating your repro on v0.18.8 to see what that does
g
ah, i did find a commit where i updated from 0.18.7 -> 0.18.8
b
that would do it
this was broken on 0.18.7
g
i see 🙂 sounds like we’re getting to the bottom of things
b
I’m trying this repro on my end on a clean stack using
pulumi/eks
v0.18.8 and will report back
I suspect it’ll be fine
g
ok, great. let me know if you need any more info from me. (ping me probably, in case i’m doing ‘real work’ in another slack)
👍🏼 1
and, FWIW, this is all playground for me; no data that needs retaining or stakeholders — so if the end move is to destroy/recreate that’s all fine with me
b
👍🏼
heads up, per [1] and [2], heapster is broken out of the box, and we’re looking to deprecate the dashboard from the cluster definition, and instead document how to roll the helm chart [2], as kube-dashboard is just not getting enough support in the community to keep up w/ the updates. I personally always deploy clusters with
deployDashboard: false
1 - https://github.com/pulumi/pulumi-kubernetes/issues/600 2 - https://github.com/pulumi/pulumi-eks/issues/155#issuecomment-504096740
g
ah, i see. i’ll add that option
b
@glamorous-thailand-23651 i was able to run your repro error-free on v0.18.8 - all of my nodes & pods come up:
Copy code
metral@argon:~/megadoomer-io/stacks/infra$ kubectl get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-1-132-209.us-east-2.compute.internal   Ready    <none>   11m   v1.12.7
ip-10-1-169-248.us-east-2.compute.internal   Ready    <none>   11m   v1.12.7



metral@argon:~/megadoomer-io/stacks/infra$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
kube-system   aws-node-2f2z8                          1/1     Running   0          11m
kube-system   aws-node-ztlld                          1/1     Running   0          11m
kube-system   coredns-65f768bbc8-5d4fx                1/1     Running   0          14m
kube-system   coredns-65f768bbc8-7jqq6                1/1     Running   0          14m
kube-system   heapster-684777c4cb-wvskb               1/1     Running   0          10m
kube-system   kube-proxy-kggzd                        1/1     Running   0          11m
kube-system   kube-proxy-nxnhr                        1/1     Running   0          11m
kube-system   kubernetes-dashboard-67d4c89764-rzd6m   1/1     Running   0          10m
kube-system   monitoring-influxdb-5c5bf4949d-sqxsd    1/1     Running   0          10m


metral@argon:~/megadoomer-io/stacks/infra$ kubectl version
Client Version: <http://version.Info|version.Info>{Major:"1", Minor:"12", GitVersion:"v1.12.6", GitCommit:"ab91afd7062d4240e95e51ac00a18bd58fddd365", GitTreeState:"clean", BuildDate:"2019-02-26T12:59:46Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"12+", GitVersion:"v1.12.6-eks-d69f1b", GitCommit:"d69f1bf3669bf00b7f4a758e978e0e7a1e3a68f7", GitTreeState:"clean", BuildDate:"2019-02-28T20:26:10Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Please try on a clean stack using v0.18.8 - you should not hit it
g
ok great - i probably would have ended up doing that at some point anyway, but glad to know the EKS destroy/create time won’t go to waste 🙂
b
🙂
g
@breezy-hamburger-69619 bringing up the new stack now. just want to confirm that with
deployDashboard: false
, no pods are deployed by default? in which case i should confirm success with
kubectl version ; kubectl get nodes
, and if everything looks good i should have no problem deploying other pods (via helm charts, etc) right?
and if that’s all correct i will gladly say thanks and wish you a good day 😄
b
Only the kube dashboard and heapster pods won't come up with that flag enabled. You'll still see the required pods installed by EKS: core-dns, aws-node, kube-proxy
That said in my repro I did not set that flag and the dash and heapster do come up and run
Per issues, heapster doesn't show up in dash graphs, and we're considering depreciating the dash all together due to lack of community support
g
ok- i’ll try it with that flag set to false for the moment, see what comes up after
aws:eks:Cluster                   k8s-eksCluster                    creating...
is done (i don’t currently see core-dns, aws-node, kube-proxy listed in the output yet) — and then give the dash+heapster a try in another iteration.
aye
b
You will only see the pods in a
kubectl get pods --all-namespaces
, not in the Pulumi output
As we don't install them, eks does
g
though, i don’t currently have a need for dash+heapster, just going with the defaults in most places unless i find a reason not to
1
roger that. new to me 🙂 we use hashicorp’s nomad at work
b
All good. Let us know if you need anything else
g
👍
tyvm for taking the time!
🎉 1