Trying to create an EKS cluster and I get a weird ...
# aws
c
Trying to create an EKS cluster and I get a weird error when adding InternetGateway routes to private subnet route tables for IPv6
Copy code
error: eks:index:Cluster resource 'tests' has a problem: Invalid net address:
    error: Error: Invalid net address:
        at new Netmask (/snapshot/eks/node_modules/netmask/lib/netmask.js:150:15)
        at Netmask.contains (/snapshot/eks/node_modules/netmask/lib/netmask.js:166:14)
        at isPrivateCIDRBlock (/snapshot/eks/bin/nodegroup.js:923:22)
        at /snapshot/eks/bin/nodegroup.js:876:93
        at Array.find (<anonymous>)
        at /snapshot/eks/bin/nodegroup.js:876:63
        at Generator.next (<anonymous>)
        at fulfilled (/snapshot/eks/bin/nodegroup.js:18:58)
        at processTicksAndRejections (node:internal/process/task_queues:95:5)
• First screenshot works • Second screenshot errors Where should I file this?
m
As I said in the other thread, the best way forward is to create a minimal reproducible example so that others can run it and if it turns out that this is a genuine error file an issue at https://github.com/pulumi/pulumi-eks/issues
There's no way anyone can know how you pass the VPC information into your cluster resource just from the error message and screenshots 🙂
c
Done. It's one VPC stack and one EKS stack. If it's not enough I can maybe create a Gist next week • The vpc-works doesn't error on preview or up • The vpc-ko errors on preview.
Yes, there's lots of hackery going on but awsx does not work in IPv6 and I cannot access/set as input some of the IPv6 CIDR ranges as they get automatically assigned by AWS or are not exposed in the outputs of their resources
@modern-zebra-45309 I found the code related to this cf https://github.com/pulumi/pulumi-eks/blob/9c128c548765c9829b2d2f1cddcd7d3e74685e7a/nodejs/eks/nodegroup.ts#L1632-L1634
Copy code
const hasInternetGatewayRoute =
            routeTable.routes.find((r) => !!r.gatewayId && !isPrivateCIDRBlock(r.cidrBlock)) !==
            undefined;
The function definition is here https://github.com/pulumi/pulumi-eks/blob/9c128c548765c9829b2d2f1cddcd7d3e74685e7a/nodejs/eks/nodegroup.ts#L1671-L1683
Copy code
function isPrivateCIDRBlock(cidrBlock: string): boolean {
    const privateA = new netmask.Netmask("10.0.0.0/8");
    const privateB = new netmask.Netmask("172.16.0.0/12");
    const privateC = new netmask.Netmask("192.168.0.0/16");


    return (
        privateA.contains(cidrBlock) || privateB.contains(cidrBlock) || privateC.contains(cidrBlock)
    );
}
As you can see, the netmasks are hardcoded for IPv4 private ranges. The netmask package only mentions IPv4 support cf
The Netmask class parses and understands IPv4 CIDR blocks so they can be explored and compared.
Which brings us back to what I shared in the opening message. I'm spinning up an IPv6 cluster which should be supported according to the docs.
m
Something is starting to ring a bell for me. Private IPv6 addresses are a new thing, right? https://aws.amazon.com/about-aws/whats-new/2024/08/aws-private-ipv6-addressing-vpcs-subnets/
c
They are—but in this case I don't think it matters if they're private or public IPv6. The problem with IPv6 addresses is that they can't be routed by the NAT Gateway which forces you to map the egress route "::0" to the IPGateway.
So hasInternetGatewayRoute errors because the route: • does mention the Internet Gateway • has an IPv6 which breaks the netmask
contains
function call as it only supports IPv4
m
Yes, this is what I think I'm starting to recall. IPv6 subnets can have egress-only internet gateways because they are globally unique.
So it looks like the "isPrivateCIDRBlock" check cannot handle IPv6 addresses because netmask will always throw an error?
Based on what you've shared, I also don't see how this could possibly work given the current implementation at https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L1633
c
That's right. I guess I can file a bug report with all this?
m
Yes, I think that's a bug. Have you tried explicitly passing the subnet IDs so that the function doesn't get called? It only appears to be invoked if it's not known which subnets are public and private: https://github.com/pulumi/pulumi-eks/blob/9c128c548765c9829b2d2f1cddcd7d3e74685e7a/nodejs/eks/nodegroup.ts#L1033-L1042
This would make sure that the problem is limited to the private subnet auto-discovery.
c
Interesting. I actually do pass them ->
Copy code
"privateSubnetIds": "${vpc.outputs[\"privateSubnetIds\"]}",
                "publicSubnetIds": "${vpc.outputs[\"publicSubnetIds\"]}",
Copy code
Current stack outputs (10):
    OUTPUT               VALUE
    internetGatewayId    igw-066ebe921971f06b5
    natGatewayIds        ["nat-092809d7c789b8b93","nat-080329705491b8233","nat-08444fa08a95e18f4"]
    privateSubnetIds     ["subnet-093feb5ce24bd0d0f","subnet-084ab3f095c254830","subnet-016a9548afe19f30a"]
    publicSubnetIds      ["subnet-03c19c7ba6cdc830a","subnet-0dc6479a9415c2778","subnet-0773c4b402181cdf9"]
m
Maybe I was in the wrong codepath? But I think I found all invocations of computeWorkerSubnets and it always looks like this
https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L1042 createNodeGroupInternal https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L1537 createNodeGroupV2Internal These are the only two places where the function is invoked and they are identical
c
Yep. I guess I need to figure out why it doesn't just pickup the privateSubnets
m
Good luck! From looking at the code, only if neither
nodeSubnetIds
nor
privateSubnetIds
nor
publicSubnetIds
is set should the worker subnets be auto-selected by trying to find the private subnets from
subnetIds
.
(But I'd say it's still a bug that it doesn't work for IPv6)
c
Interestingly if I set the
subnetIds
I get the expected error message.
Copy code
Diagnostics:
  pulumi:pulumi:Stack (mdft-tests-eks):
    error: eks:index:Cluster resource 'tests' has a problem: subnetIds, and the use of publicSubnetIds and/or privateSubnetIds are mutually exclusive. Choose a single approach.
I'm starting to think there's an issue mapping the inputs of the Provider to whatever it does underneath
For e.g I set
ipFamily
but it becomes
kubernetesNetworkConfig.IpFamily
in the state
The subnets became ->
Copy code
vpcConfig": {
                        "__defaults": [],

                            "subnet-03c19c7ba6cdc830a",
                            "subnet-0dc6479a9415c2778",
                            "subnet-0773c4b402181cdf9",
                            "subnet-093feb5ce24bd0d0f",
                            "subnet-084ab3f095c254830",
                            "subnet-016a9548afe19f30a"
                        ]
                    }
m
Can you verify that
core
as returned in https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L653 has the appropriate subnetId outputs?
If you pass them as inputs, you should see them as outputs, but who knows what's happening
> For e.g I set
ipFamily
but it becomes
kubernetesNetworkConfig.IpFamily
in the state This happens here: https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/cluster.ts#L579-L589 Looks entirely unrelated to the subnetIds, I don't see a location in the code where there could be interaction
c
Yeah. I noticed it just creates an
aws:eks:Cluster
resource underneath Here's what I get
Copy code
+ privateSubnetIds     : [
      +     [0]: "subnet-093feb5ce24bd0d0f"
      +     [1]: "subnet-084ab3f095c254830"
      +     [2]: "subnet-016a9548afe19f30a"
        ]
      + provider             : {}
      + publicSubnetIds      : [
      +     [0]: "subnet-03c19c7ba6cdc830a"
      +     [1]: "subnet-0dc6479a9415c2778"
      +     [2]: "subnet-0773c4b402181cdf9"
        ]
      + storageClasses       : {}
      + subnetIds            : [
      +     [0]: "subnet-03c19c7ba6cdc830a"
      +     [1]: "subnet-0dc6479a9415c2778"
      +     [2]: "subnet-0773c4b402181cdf9"
      +     [3]: "subnet-093feb5ce24bd0d0f"
      +     [4]: "subnet-084ab3f095c254830"
      +     [5]: "subnet-016a9548afe19f30a"
m
c
Trying to wrap my head around this bit of code
Copy code
if (args.publicSubnetIds !== undefined || args.privateSubnetIds !== undefined) {
    clusterSubnetIds = pulumi
        .all([args.publicSubnetIds || [], args.privateSubnetIds || []])
        .apply(([publicIds, privateIds]) => {
            return [...publicIds, ...privateIds];
        });
}
m
It makes sure that clusterSubnetIds is equal to the combination of privateSubnetIds and publicSubnetIds
Which is how
cluster.subnetIds
is populated here: https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/cluster.ts#L601
That matches your output above, where
subnetIds
is the combination of the private and public subnet IDs
This should not matter for your problem, because
subnetIds
is not involved in the checks that should prevent the auto-discovery from being triggered
But I think I see what's going on: aws.eks.Cluster only has
subnetIds
So if https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L653
core
is an instance of
aws.eks.Cluster
rather than an instance of
CoreDataArgs
, you'll end up with the check that's failing for you, because there's no
aws.eks.Cluster.privateSubnetIds
What exactly are you passing as
args.cluster
?
I'll have to drop now. I hope you can figure this out 🤞
c
Thanks!
I just don't understand that part. Since I'm using Pulumi YAML so I'm not actually sending anything as
args.cluster
I just realized. The other thing that is weird is that I'm actually skipping the creation of the default nodegroup.
Copy code
"skipDefaultNodeGroup": true,
And I also have
fargate
set to true. This is so weird
Issue filed with stacks (minus the S3 bucket) https://github.com/pulumi/pulumi-eks/issues/1520
m
And I also have
fargate
set to true. This is so weird
That's actually getting us closer: https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/cluster.ts#L1040-L1042 Setting up the FargateProfile takes the joint clusterSubnetIds and tries to divide them again
I left a comment on the GitHub issue.