Hello everyone, is there a pattern to create and update an infrastructure ? I have a Pulumi.yaml fi...
r
Hello everyone, is there a pattern to create and update an infrastructure ? I have a Pulumi.yaml file which describes my infra (security group, nodepool, cluster), it’s working fine and allow me to create the infra seamlessly. But upgrading the cluster is more complex as manual intervention is needed: • creation of a new nodepool • draining the node of the previous nodepool • Deleting the previous nodepool playing with yaml only (Pulumi.yaml and Pulumi.STACK.yaml) does not seem to be possible for that purpose. Any best practice to follow ? I guess using one of the language supported by Pulumi is the best approach to automate the upgrade, right ?
m
Why do you need a manual intervention here? The process of draining and deleting a nodepool should be handled by your cluster management. In the case of managed Kubernetes (which it sounds like you're using based on the keywords you mention) you don't have to worry about this at all, if you delete a nodepool the workloads will be shifted to other nodes. What you describe is the typical default behavior when a property changes that requires re-creation of a resource: The new resource is created, and the old resource is subsequently deleted. It's possible to change this through the deleteBeforeReplace resource option.
There is no fundamental difference between defining a Pulumi program in YAML or a programming language. In both cases, it's the Pulumi engine that handles communication with the providers. Note that a Pulumi program does not contain instructions for creating or modifying infrastructure ("make cluster", "modify security group") but describes the desired infrastructure state ("I want to have a cluster"). It's the job of the engine to instruct the providers to drive the infrastructure into this state.
r
Yes, I get this but I was not sure about the way to specify this in the Pulumi.yaml / Pulumi.STACK.yaml files. I mean, if I need to upgrade the cluster, creating a new nodepool, and removing the old one (for that stack only), how this should be done ?
m
You edit your program so that it includes the new nodepool and doesn't contain the old one, and then run
pulumi up
. If you just want to change something about the nodepool (e.g., machine type) you simply change this particular argument and Pulumi will know whether this requires recreation of the nodepool (in which case it will, by default, make the new one, then delete the old one) or can be done by changing the existing nodepool.
If you want a proper new nodepool, you change the name of the Pulumi resource. In this case, the deletion and creation will happen in parallel, because the old nodepool is no longer needed and the new nodepool is an unrelated new resource. If you're running a Kubernetes cluster, you probably want to upgrade the existing nodepool rather than deleting one and creating another, so that your pods can move over.
(If you can let me know which kind of Kubernetes service you're using, I can be more specific with my examples.)
r
Thanks, that makes a lot of sense. I could then specify the name of the nodepool in the config of my Pulumi.STACK.yaml, right ? I’m using the Exoscale cloud provider
m
The logical names of Pulumi resources should not depend on configuration or outputs. They should be fixed or generated within your program. Otherwise, you risk replacement or loss of resources (and data!) if the dynamic value changes.
The name of the nodepool as it shows up in your cloud is something you can change and assign dynamically, or let Pulumi generate a name based on the logical resource name, which will also help with avoiding name collisions.
See https://www.pulumi.com/docs/concepts/resources/names/ for more details on how the two are different.
r
Sorry, yes I was talking about the name of the nodepool as it appears in the cloud provider, not the logical name 👍 I will test this
Thanks a lot for your help, things are becoming clearer now :)
👍 1
🙏 1
m
Per https://www.pulumi.com/registry/packages/exoscale/api-docs/sksnodepool/#inputs you can change the name of a nodepool without replacing it. There's a small symbol next to the inputs that will trigger a replacement, which are the
clusterId
and the
zone
.
r
Hum, so this is not really what I need. When upgrading the cluster, the id and zone will not change. I guess I should create a new nodepool and delete the old one in a more manual way though
m
Why do you want to delete the nodepool, though? 🤔 Isn't it a good thing that you can just keep and update your nodepool, rather than waiting for it to be recreated? If you really want to replace the nodepool instead of doing an in-place update you can change its logical name in Pulumi.
There is also the replaceOnChanges resource option that you could use to force a replacement but I really think it is unnecessary to replace the nodepool unless there are changes to it that cannot be made otherwise.
r
Hi @modern-zebra-45309 in fact currently (working with the cloud provider cli) I do not replace the nodepool when upgrading the cluster. Instead I scale the nodepool up so it brings new version of the nodes. For instance, once scaled up, I can have 3 nodes with version 1.29.7 and 3 nodes with version 1.30.3, all in the same nodepool. Then I drain the 1.29.7 nodes and delete them. From what I understand, this can not easily be done using Pulumi, this is why I was wondering if creating a new nodepool and removing the old one in an automated way could be done instead.
Regarding the replaceOnChanges, is that something I could define in the Pulumi.yaml / Pulumi.STACK.yaml ? I guess this could be a great workaround to force the nodepool to be recreated if I change the name (the one seen in the cloud provider, not the logical one).
The last thing I see is that changing the logical name in Pulumi.yaml would impact all the stack, not just the one related to lat’s say my dev env.
(Well, I’m still new to Pulumi, maybe all I have in mind are nonsense 😉 )
m
Resource options are defined in the Pulumi program, not as part of the stack
For instance, once scaled up, I can have 3 nodes with version 1.29.7 and 3 nodes with version 1.30.3, all in the same nodepool. Then I drain the 1.29.7 nodes and delete them.
It doesn't look like the SksNodepool resource exposes the Kubernetes version. How do you control it right now? Is it just using the latest version when creating the nodepool?
r
the version is set in the SKScluster. Once this version is upgraded each new node (when scaling the nodepool) will get that version
this is why the same nodepool can have node with different versions after it is scaled
m
I think you want to set it up in a way that if you update the cluster's version through Pulumi, the nodepools are also updated
If you want to do it via the name, you can set
name=my-nodepool-${cluster.version}
or something like this, and replace when the name changes
r
yes, when I change the version config in the stack, I’d like to trigger the upgrade of the controlplane (this is working) and also trigger the upgrade of the worker node.
But, the name does not trigger the replacement of the nodepool, right ? Just the zone and clusterId seems to do that
m
Yes, but if you use replaceOnChanges, it will 🙂
r
Can I set this in my Pulumi.yaml file ? 🙂
(or in my stack file ?)
m
In the Pulumi.yaml file. Let me go find the proper syntax for you, the example in the docs does not work for YAML but just because the CRD they use does not work
r
cool, thanks
m
Copy code
resources:
  my-resource:
    type: does:not/exist
    properties:
      name: this-is-the-name-with-${cluster.version}
    options:
      replaceOnChanges:
        - name
Not 100% sure that the syntax is right but that's how it should look. The resource options go under the "options" key in the YAML format, and "replaceOnChanges" takes a list of "properties" keys that should trigger a replacement
r
Thanks a lot, I’ll try it
AFK right now but thinking about it, just replacing the nodepool will not be clean as the usual process is to drain the old node first then make sure the workload are correctly rescheduled on the new nodes. The draining of the old nodes is a critical step, I guess this could be automated in Pulumi program but not using YAML 🤔
m
This will happen automatically. When you replace a Pulumi resource and do not set
deleteBeforeReplace
to
True
, the new nodepool will be created first and then the old nodepool will be deleted. This will allow your workloads to shift, it's a very common pattern with Kubernetes clusters. You can trust Kubernetes to handle this re-scheduling for you.
It doesn't matter whether you write your Pulumi program in YAML or another language. In both cases, you declare what infrastructure you want to see and hand it over to the Pulumi engine. A Python or Typescript Pulumi program is executed entirely and completes before any infrastructure is created, it's not like a deployment script. There are some features that are not available in the YAML version I think (e.g.,
get
functions for resources) but the control over replacements and deletions is exactly the same.
It is possible to integrate "waits" via outputs (like this: https://gist.github.com/metral/48a576680208d1c9961c37c5b1f0025e), but in your case I don't see a reason why this would be necessary.
r
I have tested your approach and it’s working fine.
Copy code
resources:
  my-resource:
    type: does:not/exist
    properties:
      name: this-is-the-name-with-${cluster.version}
    options:
      replaceOnChanges:
        - name
A new nodepool is created and the previous one is deleted. Also, the workload is migrated to the new nodepool as it’s Kubernetes job to do that part. But (sorry there is a “but” 🙂 ), the old nodepool is deleted just after the new nodepool is created. Then Kubernetes takes a few tens of seconds before deciding to move the workloads. So just after the old nodepool is deleted, the workload is not running anymore (interuption of service), we have to wait for kubernetes to detect the nodes are not there and then to reschedule the workloads on the new node. The proper way to do that would be to have both nodepools in parallel, then to drain the old nodes, then make sure everything is correctly rescheduled, then to delete the old nodepool.
m
I think this is something that you should be able to configure on the Kubernetes level, see https://kubernetes.io/docs/concepts/cluster-administration/node-shutdown/
When you're running a cluster autoscaler, pods on underutilized nodes are regularly shifted to other nodes before these nodes are taken out of the cluster. It's definitely possible to set this up in a way that does not cause interruptions.
r
I think I need to find a way (without relying on the configuration of the cluster, as some params might not be available on all cloud providers) to wait for the workloads to be scheduled on the new nodes before removing the old one 🤔
I’ve just tested you approach another time, this time upgrading the cluster version. This worked fine but the interuption of service (due to the deletion of the old nodepool right away) can be a couple of minutes 😞
m
I've not worked with Kubernetes on Exoscale before but I think it's worth digging into what you can do on a Kubernetes level. You don't want to have service interruptions just because a node is terminated, which can happen anytime. Trying to solve this from the outside seems like a hack/anti-pattern to me. If you switch to a non-YAML flavor of Pulumi, you can use the pattern I linked above (https://gist.github.com/metral/48a576680208d1c9961c37c5b1f0025e or https://gist.github.com/lukehoban/fd0355ed5b82386bd89c0ffe2a3c916a) to wait for Kubernetes resources to become available.
r
in fact the issue is not because a node is terminated, you’r right this can happen all the time. It’s more that all the nodes (previous nodepool) are terminated at the same time, thus leaving the workload as is. Kubernetes will do its job correctly, but it will take much more time than if the darin was done correctly first 🤔
m
I see, so you'd need a way to implement a delay or health check, similar to a CloudFormation CreationPolicy to only start removing the old nodepool once the new nodepool is fully available. I don't think this exists in Pulumi.
r
In fact I would need a way to remove the old nodepool once the drain of its nodes is correctly done so we are sure the workload are now running on the new nodepool.
I saw an examples in Pulumi repo which does this for an EKS cluster, it requires some manual step tough. https://www.pulumi.com/registry/packages/kubernetes/how-to-guides/eks-migrate-nodegroups/
I remember dealing with upgrade of GKE clusters using Terraform some times ago. The terraform apply takes care of the drain (if I remember correctly)
m
I think the EKS example uses the approach I mentioned initially: Add a second logical nodegroup in your Pulumi program, run pulumi up, migrate the workloads, remove the original logical nodegroup from your Pulumi program, run pulumi up again.
r
I though about that but what if I have several stacks (dev, qa, prod) and only want to upgrade the dev cluster ? If I change the program it will be available to each stacks, right ?
m
Only if you actually deploy the stack with "pulumi up." You can evolve the program underlying each stack separately.
So you could do the upgrade procedure one stack each at a time. But I agree that it's not pretty.
r
In fact I’d like to have a program (or YAML) and only manage the config of each stack so they use the same base program
m
Yes, which is how it should be, but you're now mixing in operational deployment/maintenance concerns into your infrastructure code, which should be handled by the provider
Ideally, your new nodepool would only show up as "up and ready" once it is actuallly available for scheduling workloads on it, and then you would gracefully terminate the old nodepool. But it sounds like the deletion of the old nodepool starts long before the new nodepool is actually ready from an application perspective.
r
In fact, as I saw this upgrade working fine in terraform for EKS I thought this could be easily replicated with Pulumi for Exoscale. But there might be an automated upgrade (taking into account the drain) which is not implemented in the cloud provider side I guess. Do you think each provider maintain its Pulumi library ?
m
I have not seen the problem you face with EKS. I'm pretty sure (although I have not checked) that node group replacements work smoothly there, at least when everything is appropriately configured.
Do you think each provider maintain its Pulumi library ?
I don't think so. A "provider" is a component in Pulumi (and Terraform) that handles the communication with the platform. So there's an AWS provider (several, actually), a Kubernetes provider...
Someone actually filed an issue describing the problem you're facing (or at least a similar one) over a year ago: https://github.com/pulumiverse/pulumi-exoscale/issues/109
The provider is based on the Terraform provider maintained by Exoscale: https://github.com/exoscale/terraform-provider-exoscale You could reach out for help there.
r
Thanks, you right I will check that on Exoscale side 👍 BTW, thanks a lot for you help, I understood many things in my Pulumi journey 😀
m
You're welcome, I'm happy if I can help, it's usually a great learning opportunity for me as well 🙂
r
Hey @modern-zebra-45309 I think I will go with one of your recommendations, using a single project per environment. Thus my dev cluster will have its own project so I can easily create an additional node pool and drain the old one when it’s there (without impacting the other environment / stack). I wanted to kinda use the same program definition (yaml in my case) for each stack, but that will be too complicated for the upgrade process
👍 1
m
Note that if you'd switch from YAML to a programming language like Python, you could re-use parts of your code between programs. You could have a joint cluster setup that you import into your environment-specific programs.