Hi there! I activated EKS Auto mode using Pulumi E...
# aws
s
Hi there! I activated EKS Auto mode using Pulumi EKS v3 (crosswalk). Is there a formal method to disable it? After I removed the
Copy code
autoMode: {
    enabled: true,
},
the
pulumi up
failed with following errors
Copy code
Diagnostics:
  pulumi:pulumi:Stack (brainfish-universe-eks-au):
    error: eks:index:Cluster resource 'brainfish-au' has a problem: grpc: the client connection is closing

  aws:eks:Cluster (brainfish-au-eksCluster):
    error:   sdk-v2/provider2.go:515: sdk.helper_schema: compute_config.enabled, kubernetes_networking_config.elastic_load_balancing.enabled, and storage_config.block_storage.enabled must all be set to either true or false: provider=aws@6.66.1
    error: diffing urn:pulumi:au::brainfish-universe-eks::eks:index:Cluster$aws:eks/cluster:Cluster::brainfish-au-eksCluster: 1 error occurred:
        * compute_config.enabled, kubernetes_networking_config.elastic_load_balancing.enabled, and storage_config.block_storage.enabled must all be set to either true or false
Additionally, I'd love to learn more about the design thinking behind the @pulumi/eks NodeGroup. It utilizes an autoscaling group in the background and requires minimum and maximum node counts. I'm curious about when it actually scales up, as I haven't noticed any changes in the machine nodes within the autoscaling group created by pulumi/eks.
q
Hey @stale-tomato-37875, removing the autoMode block should do the trick. Alternatively you could try setting
enabled
to false. I opened an issue for this and will start looking into it: https://github.com/pulumi/pulumi-eks/issues/1585
Regarding your question about the ASG of the node groups. Generally those are not scaled automatically in AWS. You can control the scaling behavior using the
aws.autoscaling.Policy
resource. But what's way better is hooking the ASGs directly into the kubernetes lifecycle and driving scaling decisions based on the resource requests in your cluster.
s
Thank you for providing these references! They really help clear things up.
q
Another option that I really like is using Karpenter. It's way more dynamic than using Node Groups and has some very nifty capacity rebalancing features that can save you a ton of money. E.g. Slack was able to achieve 12% compute cost savings with it: https://aws.amazon.com/blogs/containers/how-slack-adopted-karpenter-to-increase-operational-and-cost-efficiency/
s
Yeah, Karpenter looks promising. I feel EKS auto mode is managed Karpenter. I’d like to give it a further spinning. I’m working on a migration path from existing pulumi crosswalk self managed node group to auto managed nodes. Feel a bit challenging when I need to minimise the production disruption 😅 Meanwhile, if EKS auto becomes mainstream, I see less needs around @pulumi/eks package 😵‍💫
q
Yeah, it's basically managed Karpenter + managed networking addons + managed Load Balancer integration. So operationally speaking it really takes away a lot of ops burden! At the same time you pay for that (quite literally 😄) with higher instance costs. I spot checked a few and generally saw a ~10% surcharge for instances managed with auto mode. It's all tradeoffs in the end 🙂
My general recommendation now would be to start out with Auto Mode and then move to either Managed Node Groups or Karpenter if there's feature gaps or the additional cost's too high at scale.
s
Thanks for sharing. I remember Managed Node Group is less feature rich than NodeGroupV2 right? For example, I can define extraSecurityGroup with NodeGroupV2 but impossible with Managed Node Group?
q
You can do that with managed node groups, but you'll have to set it in the Launch Template and pass that in. So it's a bit more complicated to do that. The design challenge with ManagedNodeGroup is deciding on what underlying options to directly expose without making the API too convoluted. If there's certain gaps that currently stop you from using it, please open a feature request on GitHub! This helps us prioritize and make decisions 🙂
But generally yes, there's certain advanced settings that AWS EKS just does not expose for the managed node groups.
s
I saw some relevant issues around diskSize, I can understand the implications. That’s a hard tech decision to make. I’ll think twice and raise an issue if needed!
q
Thanks for bringing this issue up. This is actually resolved already, our automation just failed to close it. I've closed it now. There's diskSize and gpu inputs now!
b
One thing to think about regarding EKS automode is that it does not (yet) support using prefix mode via vpc-cni for allocating prefixed groups of IP addresses from the subnets for pods. Thus the number of pods possible to run on an instance is restricted to the number of ENIs and number of IP addresses per ENI. We were considering converting some clusters to automode, but since it does not support using the prefixes, that was a non-starter.