Any reason why the EKS security group that manages...
# general
b
Any reason why the EKS security group that manages the communication between the k8s API server and the node instances only allows ports 1025-65535? https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/securitygroup.ts#L48
w
This was originally derived from the EKS documentation and NodeGroup CloudFormation templates: https://github.com/awslabs/amazon-eks-ami/blob/master/amazon-eks-nodegroup.yaml#L228. It does actually look like that upstream template added port
443
as well, which
@pulumi/eks
does not currently enable. Does the service you are interested in really only work over port
80
, or could it be exposed on port
443
if we fixed this to match the EKS recommended ingress/egress? I am not sure of the underlying reason for constraining access to other lower port numbers from/to the control plane. cc @breezy-hamburger-69619 in case he has thoughts on this?
b
Well, that monitoring stack is completely managed by rancher, so I don't think that port can be easily changed.
But I really don't see any reason for not being able to f.e.
kubectl proxy
to services with ports lower than 1025... 🤷
I mean, I know that ports <= 1024 are "privileged", but still... 🙂
b
We enable all ports that are recommended in the EKS docs: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html, 1025-65535 being a range listed. Not enabling ports <=1024 is more about enforcing least privilege where possible. To take a step back though, the control plane subnet and the worker subnet should only be used for k8s cluster communications. IIUC your rancher monitoring needs, it should instead be running in-cluster as a DaemonSet or similar and using the internal cluster networking. If you still want to go ahead and get around these limits, you can always pass in your own
nodeSecurityGroup
and
eksClusterIngressRule
into the `NodeGroup`: https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L239
b
@breezy-hamburger-69619 Thanks for the explanation!
To take a step back though, the control plane subnet and the worker subnet should only be used for k8s cluster communications
As I said above,
kubectl proxy
(and the k8s proxy API subsystem, in general) needs for connections from the control plane to a pod to be allowed. Just for the argument's sake, I really don't see why you won't allow proxying to ports lower than 1025.
c
I actually agree with @breezy-hamburger-69619 here. We want the EKS package to produce clusters that are “prod-first”, with good security defaults. Generally I think it’s a good idea to add a bit of friction to things like exposing ports <= 1024, because it makes people work intentionally. Especially if the work-around is as simple as supplying your own security group.
w
Especially if it is as simple as supplying your own security group.
Even better, in the release coming out today, it is possible to just add an additional ingress rule to the existing security group to allow this specific access pattern.
c
ah that’s great too.
b
@creamy-potato-29402 I (partly) agree. But being used to GKE and bare metal clusters, it wasn't immediately apparent for me what the issue is, and I "wasted" 1/2h trying to find it. 🙂
c
mmm
b
@white-balloon-205 we moved to separate security group rules but have not made the ability to provide user rules and merge them in quite yet. that’s being tracked in https://github.com/pulumi/pulumi-eks/issues/97 that @busy-pizza-73563 opened up
c
I do think that’s a usability problem.
but I’m not sure how to do better.
b
Well, except for allowing everything TCP, I don't have a good suggestion either.
I still think the k8s baseline is allowing to proxy everything.
c
what are you trying to proxy, now?
w
@breezy-hamburger-69619 the resulting security group is exposed to the user, so they can just add additional ingress rules, right?
c
I understand there’s something with rancher, but I’m not super familiar.
b
That’s correct @white-balloon-205. @busy-pizza-73563 you have the
nodeSecurityGroup
available to you so you can create separate seccgroup rules to open what you need using its id
b
@creamy-potato-29402 When using rancher's integrated monitoring stack, grafana should be accessible at
<https://rancher.url/k8s/clusters/c-12345/api/v1/namespaces/cattle-prometheus/services/http:access-grafana:80/proxy/>
, which in turn proxies to
<https://k8s.url:port/api/v1/namespaces/cattle-prometheus/services/http:access-grafana:80/proxy/>
.
And
access-grafana
service is pointing to
:80
inside the corresponding pod(s).
c
and you can’t run this in-cluster?
b
It doesn't really matter, as long as the "vanilla" k8s proxying is not working.
c
I see. And rancher really does not let you change the port??
b
The issue is that the control plane tries to connect to the grafana pod's
IP:80
.
No, but I don't really see why they would.
c
Sorry — isn’t that in the kube overlay network though?
I’m super confused.
If you’re running inside the cluster, you should be able to access those ports, I think? Am I missing something?
b
The control plane is EKS managed, and communication between it and the node instances honors the
${name}-nodeSecurityGroup
, see https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/securitygroup.ts#L57 .
I’m super confused.
I was the same during that 1/2h looking for the issue. 🙂
c
sorry, still confused — the SG does not disallow ports in the kube overlay network, though, right?
b
The SG only allows cluster plane -> node instance/pod
IP:1025-65536
.
But that service points to pod
IP:80
.
c
right, but what I’m asking is: that port is not the “real” port allocated by the overlay network, is it?
b
Well, afaik how k8s proxy works is it looks at the service's endpoints and connects (randomly?) to one of them.
c
I forget this part of the kube networking. I thought the port was mapped to some other port. could be wrong.
b
So if svc
access-grafana:80
has endpoint
grafanaPodIP:80
, when proxying to the svc the API will try to connect to the pod.
c
@breezy-hamburger-69619 is that true? Or is the IP address the only thing that’s faked by the overlay network?
b
In-cluster each service has a cluster IP, true. But that's not how proxy works, afaik. But I might be wrong. 🙂
b
I dont remember kubectl proxy internals, but the grafana Pod IP is in the overlay networking space so this secgroup rules should not apply…unless there is some overarching reproxying to a cluster port, but this does not ring a bell
c
that’s what I think as well.
again — could be wrong
b
Well, then how would you explain that if I change 1025 to 80 everything starts working? 🙂
b
in fact, in other distros they limit 1025-65535 even further to only the absolute, necessary ports needed between control plane and worker. i’m actually surprised AWS suggests opening the secgroup this widely
c
is it internal or external?
b
Oh, another thing, it was in the rancher issue. When trying to access
/api/v1/.../services/http:access-grafana:80/proxy/
I got
Copy code
Error: 'dial tcp a.b.c.d:80: connect: connection timed out'
Trying to reach: '<http://a.b.c.d:80/>'
where
a.b.c.d
is the grafana pod IP.
@breezy-hamburger-69619 Which other distros? I didn't get that with either GKE or
kubeadm
bare metal clusters.
b
CoreOS Tectonic, which was open source k8s and i worked on it back in the day. I haven’t looked into what GKE is doing, that’s prob a good next step
c
I still think it’s the right move to lock down the SG right now — I’m just trying to understand why this doesn’t work.
If it’s cluster-internal, I believe this should “just work”
b
Exactly! 😄
b
b
I have no issue with locking things down, but if this deviates from the baseline, it should be (somehow) documented.
c
100% agree. But my question is: does this work from inside the cluster?
b
Does what work? Proxying only makes sense from outside the cluster, right?
c
you can proxy in the cluster, and you would be using on the kube overlay network, where the SGs should have no effect.
That’s what I think anyway.
b
It still goes through the API server, so I see no reason why it would work.
Ok, found this: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
Copy code
*Note*

To allow proxy functionality on privileged ports or to run the CNCF conformance tests yourself, you must edit the security groups for your control plane and the worker nodes. The security group on the worker nodes' side needs to allow inbound access for ports 0-65535 from the control plane, and the control plane side needs to allow outbound access to the worker nodes on ports 0-65535.
c
ah
I see.
you are correct, then.
b
Well, seems like an upstream "issue"/consideration, then. 🙂
b
Your work arounds are: 1. You can provide your own
nodeSecurityGroup
for the
NodePool
that you can configure yourself entirely 2. get the id of the
nodeSecurityGroup
to build new secgroup rules, but this will be a step that occurs post-secgroup and cluster creation 3. Take a stab at https://github.com/pulumi/pulumi-eks/issues/97 and we can review and guide you through it if needed. Given that the secgroups and secgroup rules are now separated [1] this should be a bit more straight-forward to implement. -- 1 - https://github.com/pulumi/pulumi-eks/pull/109
c
indeed
b
@breezy-hamburger-69619 For now the workaround was to edit that rule from the AWS console and change that 1025 to 80. 🙂
b
Fair enough, but note this will create a mismatch of state between AWS and pulumi 🙂
b
Yeah, I know, but I documented it. 🙂
🙂 1
I'll try to take a stab at #97 when I'll find a bit of time.
🎉 1
(actually I opened that because of a completely different use case - opening 22/tcp from the internet to the node instances, which is the other SG 🙂 )
Anyway, thanks all for your time!
👍🏼 1