Hi everyone, I'm trying to upgrade EKS NodeGroup f...
# aws
g
Hi everyone, I'm trying to upgrade EKS NodeGroup from AL2 to AL2023, based on documentation:
Copy code
Update the node group in place. Pulumi does this by first creating the new replacement nodes and then shutting down the old ones which will move pods to the new nodes forcibly. This is the default behavior when node groups are updated.
It seems does not happen to me. The preview shows the strategy is
replace
instead of
update
-> entire nodeGroup will be deleted first, then creating new nodeGroup, instead of update node by node.
Copy code
pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:stack::eks::pulumi:pulumi:Stack::stack]
    --aws:eks/nodeGroup:NodeGroup: (delete-replaced)
        [id=stack:test-upgrade]
        [urn=urn:pulumi:stack::eks::aws:eks/nodeGroup:NodeGroup::test-upgrade]
    +-aws:eks/nodeGroup:NodeGroup: (replace)
        [id=stack:test-upgrade]
        [urn=urn:pulumi:stack::eks::aws:eks/nodeGroup:NodeGroup::test-upgrade]
      ~ amiType: "AL2_x86_64" => "AL2023_x86_64_STANDARD"
    ++aws:eks/nodeGroup:NodeGroup: (create-replacement)
        [id=stack:test-upgrade]
        [urn=urn:pulumi:stack::eks::aws:eks/nodeGroup:NodeGroup::test-upgrade]
      ~ amiType: "AL2_x86_64" => "AL2023_x86_64_STANDARD"
Am I missing something?
q
Hey, you're referencing the part for Self managed node groups, but it seems like you're using managed node groups. While the AWS EKS service gracefully handles regular updates for managed node groups, it doesn't allow updating the ami type in place and requires a replacement. I took a note of that and will update our migration guide toclearly flag that. In this case, the node group shouldn't get deleted first before recreation, instead the new node group would get created, then the old one would get deleted. EKS drains managed node groups when deleting them, but the behavior is not configurable. It does it for all pods in the node group at the same time. There's an AWS feature request to improve this behavior: https://github.com/aws/containers-roadmap/issues/1636 If draining all the nodes at the same time is not desirable for you, you could follow the guide for gracefully replacing that's mentioned further down in the guide. Essentially creating the new node groups side by side and then draining the old ones (either with
kubectl drain
or
eksctl delete nodegroup
).
s
I ran into the same issue, just like Florian mentioned; the wording in the document needs some adjustments. I really appreciate the guides provided further down in the articles!
g
@quick-house-41860, thanks for your details reply. You're right, I was confused between Pulumi AWS classic and Pulumi EKS. I'm using
aws.eks.NodeGroup
of Pulumi AWS classic and thought my nodeGroup is self-managed, but it's wrong.
Copy code
In this case, the node group shouldn't get deleted first before recreation, instead the new node group would get created, then the old one would get deleted.
-> It actually did that and not desirable to me.
Copy code
1. --  └─ aws:eks:NodeGroup  test-upgrade  deleting original (3s)..
2. ++  └─ aws:eks:NodeGroup  test-upgrade  creating replacement (24s)     [diff: ~amiType]
Drain the old nodeGroup is fine for me, as long as all the pods is scheduled on new nodeGroup. My purpose is no downtime in upgrade process.
q
I've already updated the doc to provide more details, it should land in the website today/tomorrow. Stay tuned! Ah given you're using aws classic directly there might be some other settings affecting this here. Do you by any chance have either the
deleteBeforeReplace
resource option turned on or the
name
input property set? In this case the provider will delete the node group before recreating it. Please be careful with that doc if you're using aws classic. Many of the same principles apply there as well (pulumi-eks uses pulumi-aws under the hood), but it's an opinionated abstraction so you could've configured things differently than how the doc is assuming it.
g
1.
deleteBeforeReplace
I believe not, if it's not default option. 2. What do you mean by
name
?
Copy code
new NodeGroup(name: string, args: NodeGroupArgs, opts?: CustomResourceOptions);
q
It's the
nodeGroupName
input property in this case
g
Yes, I did set the nodeGroupName. Not sure if set the nodeGroupName lead to this behavior make sense. Btw at least now we can confirm change ami type combined with set nodeGroupName will delete nodeGroup when upgrade
q
Yes, setting the name forces a delete before replace for most resources. The reason for that is that there cannot be two resources with the same name in AWS. Unless you absolutely need a static name, I'd recommend to let the provider auto name the resource.
To do a safe upgrade in this case, you'll need to add a new node group side by side. This part should explain how to do that and drain the old nodes: https://github.com/pulumi/pulumi-eks/blob/master/docs/eks-v3-migration.md#graceful-upgrade
g
@quick-house-41860 Many thanks, I'll check that 👏