Hello everyone, is there a pattern to create and update an infrastructure ? I have a Pulumi.yaml fi...

rapid-belgium-27383

08/06/2024, 5:06 PM

Hello everyone, is there a pattern to create and update an infrastructure ? I have a Pulumi.yaml file which describes my infra (security group, nodepool, cluster), it’s working fine and allow me to create the infra seamlessly. But upgrading the cluster is more complex as manual intervention is needed: • creation of a new nodepool • draining the node of the previous nodepool • Deleting the previous nodepool playing with yaml only (Pulumi.yaml and Pulumi.STACK.yaml) does not seem to be possible for that purpose. Any best practice to follow ? I guess using one of the language supported by Pulumi is the best approach to automate the upgrade, right ?

modern-zebra-45309

08/06/2024, 5:21 PM

Why do you need a manual intervention here? The process of draining and deleting a nodepool should be handled by your cluster management. In the case of managed Kubernetes (which it sounds like you're using based on the keywords you mention) you don't have to worry about this at all, if you delete a nodepool the workloads will be shifted to other nodes. What you describe is the typical default behavior when a property changes that requires re-creation of a resource: The new resource is created, and the old resource is subsequently deleted. It's possible to change this through the deleteBeforeReplace resource option.

modern-zebra-45309

08/06/2024, 5:24 PM

There is no fundamental difference between defining a Pulumi program in YAML or a programming language. In both cases, it's the Pulumi engine that handles communication with the providers. Note that a Pulumi program does not contain instructions for creating or modifying infrastructure ("make cluster", "modify security group") but describes the desired infrastructure state ("I want to have a cluster"). It's the job of the engine to instruct the providers to drive the infrastructure into this state.

rapid-belgium-27383

08/06/2024, 5:29 PM

Yes, I get this but I was not sure about the way to specify this in the Pulumi.yaml / Pulumi.STACK.yaml files. I mean, if I need to upgrade the cluster, creating a new nodepool, and removing the old one (for that stack only), how this should be done ?

modern-zebra-45309

08/06/2024, 5:31 PM

You edit your program so that it includes the new nodepool and doesn't contain the old one, and then run

pulumi up

. If you just want to change something about the nodepool (e.g., machine type) you simply change this particular argument and Pulumi will know whether this requires recreation of the nodepool (in which case it will, by default, make the new one, then delete the old one) or can be done by changing the existing nodepool.

modern-zebra-45309

08/06/2024, 5:32 PM

If you want a proper new nodepool, you change the name of the Pulumi resource. In this case, the deletion and creation will happen in parallel, because the old nodepool is no longer needed and the new nodepool is an unrelated new resource. If you're running a Kubernetes cluster, you probably want to upgrade the existing nodepool rather than deleting one and creating another, so that your pods can move over.

modern-zebra-45309

08/06/2024, 5:33 PM

(If you can let me know which kind of Kubernetes service you're using, I can be more specific with my examples.)

rapid-belgium-27383

08/06/2024, 5:37 PM

Thanks, that makes a lot of sense. I could then specify the name of the nodepool in the config of my Pulumi.STACK.yaml, right ? I’m using the Exoscale cloud provider

modern-zebra-45309

08/06/2024, 5:39 PM

The logical names of Pulumi resources should not depend on configuration or outputs. They should be fixed or generated within your program. Otherwise, you risk replacement or loss of resources (and data!) if the dynamic value changes.

modern-zebra-45309

08/06/2024, 5:40 PM

The name of the nodepool as it shows up in your cloud is something you can change and assign dynamically, or let Pulumi generate a name based on the logical resource name, which will also help with avoiding name collisions.

modern-zebra-45309

08/06/2024, 5:42 PM

See https://www.pulumi.com/docs/concepts/resources/names/ for more details on how the two are different.

rapid-belgium-27383

08/06/2024, 5:46 PM

Sorry, yes I was talking about the name of the nodepool as it appears in the cloud provider, not the logical name 👍 I will test this

rapid-belgium-27383

08/06/2024, 5:46 PM

Thanks a lot for your help, things are becoming clearer now :)

👍 1

🙏 1

modern-zebra-45309

08/06/2024, 5:48 PM

Per https://www.pulumi.com/registry/packages/exoscale/api-docs/sksnodepool/#inputs you can change the name of a nodepool without replacing it. There's a small symbol next to the inputs that will trigger a replacement, which are the

clusterId

and the

zone

rapid-belgium-27383

08/06/2024, 6:32 PM

Hum, so this is not really what I need. When upgrading the cluster, the id and zone will not change. I guess I should create a new nodepool and delete the old one in a more manual way though

modern-zebra-45309

08/06/2024, 10:41 PM

Why do you want to delete the nodepool, though? 🤔 Isn't it a good thing that you can just keep and update your nodepool, rather than waiting for it to be recreated? If you really want to replace the nodepool instead of doing an in-place update you can change its logical name in Pulumi.

modern-zebra-45309

08/06/2024, 10:47 PM

There is also the replaceOnChanges resource option that you could use to force a replacement but I really think it is unnecessary to replace the nodepool unless there are changes to it that cannot be made otherwise.

rapid-belgium-27383

08/07/2024, 8:57 AM

Hi @modern-zebra-45309 in fact currently (working with the cloud provider cli) I do not replace the nodepool when upgrading the cluster. Instead I scale the nodepool up so it brings new version of the nodes. For instance, once scaled up, I can have 3 nodes with version 1.29.7 and 3 nodes with version 1.30.3, all in the same nodepool. Then I drain the 1.29.7 nodes and delete them. From what I understand, this can not easily be done using Pulumi, this is why I was wondering if creating a new nodepool and removing the old one in an automated way could be done instead.

rapid-belgium-27383

08/07/2024, 8:59 AM

Regarding the replaceOnChanges, is that something I could define in the Pulumi.yaml / Pulumi.STACK.yaml ? I guess this could be a great workaround to force the nodepool to be recreated if I change the name (the one seen in the cloud provider, not the logical one).

rapid-belgium-27383

08/07/2024, 9:00 AM

The last thing I see is that changing the logical name in Pulumi.yaml would impact all the stack, not just the one related to lat’s say my dev env.

rapid-belgium-27383

08/07/2024, 9:00 AM

(Well, I’m still new to Pulumi, maybe all I have in mind are nonsense 😉 )

modern-zebra-45309

08/07/2024, 9:01 AM

Resource options are defined in the Pulumi program, not as part of the stack

modern-zebra-45309

08/07/2024, 9:03 AM

For instance, once scaled up, I can have 3 nodes with version 1.29.7 and 3 nodes with version 1.30.3, all in the same nodepool. Then I drain the 1.29.7 nodes and delete them.

It doesn't look like the SksNodepool resource exposes the Kubernetes version. How do you control it right now? Is it just using the latest version when creating the nodepool?

rapid-belgium-27383

08/07/2024, 9:04 AM

the version is set in the SKScluster. Once this version is upgraded each new node (when scaling the nodepool) will get that version

rapid-belgium-27383

08/07/2024, 9:04 AM

this is why the same nodepool can have node with different versions after it is scaled

modern-zebra-45309

08/07/2024, 9:05 AM

I think you want to set it up in a way that if you update the cluster's version through Pulumi, the nodepools are also updated

modern-zebra-45309

08/07/2024, 9:06 AM

If you want to do it via the name, you can set

name=my-nodepool-${cluster.version}

or something like this, and replace when the name changes

rapid-belgium-27383

08/07/2024, 9:06 AM

yes, when I change the version config in the stack, I’d like to trigger the upgrade of the controlplane (this is working) and also trigger the upgrade of the worker node.

rapid-belgium-27383

08/07/2024, 9:07 AM

But, the name does not trigger the replacement of the nodepool, right ? Just the zone and clusterId seems to do that

modern-zebra-45309

08/07/2024, 9:07 AM

Yes, but if you use replaceOnChanges, it will 🙂

rapid-belgium-27383

08/07/2024, 9:07 AM

Can I set this in my Pulumi.yaml file ? 🙂

rapid-belgium-27383

08/07/2024, 9:08 AM

(or in my stack file ?)

modern-zebra-45309

08/07/2024, 9:08 AM

In the Pulumi.yaml file. Let me go find the proper syntax for you, the example in the docs does not work for YAML but just because the CRD they use does not work

rapid-belgium-27383

08/07/2024, 9:08 AM

cool, thanks

modern-zebra-45309

08/07/2024, 9:10 AM

Copy code

resources:
  my-resource:
    type: does:not/exist
    properties:
      name: this-is-the-name-with-${cluster.version}
    options:
      replaceOnChanges:
        - name

modern-zebra-45309

08/07/2024, 9:12 AM

Not 100% sure that the syntax is right but that's how it should look. The resource options go under the "options" key in the YAML format, and "replaceOnChanges" takes a list of "properties" keys that should trigger a replacement

rapid-belgium-27383

08/07/2024, 9:14 AM

Thanks a lot, I’ll try it

rapid-belgium-27383

08/07/2024, 10:01 AM

AFK right now but thinking about it, just replacing the nodepool will not be clean as the usual process is to drain the old node first then make sure the workload are correctly rescheduled on the new nodes. The draining of the old nodes is a critical step, I guess this could be automated in Pulumi program but not using YAML 🤔

modern-zebra-45309

08/07/2024, 10:07 AM

This will happen automatically. When you replace a Pulumi resource and do not set

deleteBeforeReplace

True

, the new nodepool will be created first and then the old nodepool will be deleted. This will allow your workloads to shift, it's a very common pattern with Kubernetes clusters. You can trust Kubernetes to handle this re-scheduling for you.

modern-zebra-45309

08/07/2024, 10:09 AM

It doesn't matter whether you write your Pulumi program in YAML or another language. In both cases, you declare what infrastructure you want to see and hand it over to the Pulumi engine. A Python or Typescript Pulumi program is executed entirely and completes before any infrastructure is created, it's not like a deployment script. There are some features that are not available in the YAML version I think (e.g.,

get

functions for resources) but the control over replacements and deletions is exactly the same.

modern-zebra-45309

08/07/2024, 10:14 AM

It is possible to integrate "waits" via outputs (like this: https://gist.github.com/metral/48a576680208d1c9961c37c5b1f0025e), but in your case I don't see a reason why this would be necessary.

rapid-belgium-27383

08/07/2024, 12:03 PM

I have tested your approach and it’s working fine.

Copy code

resources:
  my-resource:
    type: does:not/exist
    properties:
      name: this-is-the-name-with-${cluster.version}
    options:
      replaceOnChanges:
        - name

A new nodepool is created and the previous one is deleted. Also, the workload is migrated to the new nodepool as it’s Kubernetes job to do that part. But (sorry there is a “but” 🙂 ), the old nodepool is deleted just after the new nodepool is created. Then Kubernetes takes a few tens of seconds before deciding to move the workloads. So just after the old nodepool is deleted, the workload is not running anymore (interuption of service), we have to wait for kubernetes to detect the nodes are not there and then to reschedule the workloads on the new node. The proper way to do that would be to have both nodepools in parallel, then to drain the old nodes, then make sure everything is correctly rescheduled, then to delete the old nodepool.

modern-zebra-45309

08/07/2024, 12:04 PM

I think this is something that you should be able to configure on the Kubernetes level, see https://kubernetes.io/docs/concepts/cluster-administration/node-shutdown/

modern-zebra-45309

08/07/2024, 12:07 PM

When you're running a cluster autoscaler, pods on underutilized nodes are regularly shifted to other nodes before these nodes are taken out of the cluster. It's definitely possible to set this up in a way that does not cause interruptions.

rapid-belgium-27383

08/07/2024, 12:25 PM

I think I need to find a way (without relying on the configuration of the cluster, as some params might not be available on all cloud providers) to wait for the workloads to be scheduled on the new nodes before removing the old one 🤔

rapid-belgium-27383

08/07/2024, 12:31 PM

I’ve just tested you approach another time, this time upgrading the cluster version. This worked fine but the interuption of service (due to the deletion of the old nodepool right away) can be a couple of minutes 😞

modern-zebra-45309

08/07/2024, 12:48 PM

I've not worked with Kubernetes on Exoscale before but I think it's worth digging into what you can do on a Kubernetes level. You don't want to have service interruptions just because a node is terminated, which can happen anytime. Trying to solve this from the outside seems like a hack/anti-pattern to me. If you switch to a non-YAML flavor of Pulumi, you can use the pattern I linked above (https://gist.github.com/metral/48a576680208d1c9961c37c5b1f0025e or https://gist.github.com/lukehoban/fd0355ed5b82386bd89c0ffe2a3c916a) to wait for Kubernetes resources to become available.

rapid-belgium-27383

08/07/2024, 12:57 PM

in fact the issue is not because a node is terminated, you’r right this can happen all the time. It’s more that all the nodes (previous nodepool) are terminated at the same time, thus leaving the workload as is. Kubernetes will do its job correctly, but it will take much more time than if the darin was done correctly first 🤔

modern-zebra-45309

08/07/2024, 1:19 PM

I see, so you'd need a way to implement a delay or health check, similar to a CloudFormation CreationPolicy to only start removing the old nodepool once the new nodepool is fully available. I don't think this exists in Pulumi.

rapid-belgium-27383

08/07/2024, 1:25 PM

In fact I would need a way to remove the old nodepool once the drain of its nodes is correctly done so we are sure the workload are now running on the new nodepool.

rapid-belgium-27383

08/07/2024, 1:26 PM

I saw an examples in Pulumi repo which does this for an EKS cluster, it requires some manual step tough. https://www.pulumi.com/registry/packages/kubernetes/how-to-guides/eks-migrate-nodegroups/

rapid-belgium-27383

08/07/2024, 1:30 PM

I remember dealing with upgrade of GKE clusters using Terraform some times ago. The terraform apply takes care of the drain (if I remember correctly)

modern-zebra-45309

08/07/2024, 1:31 PM

I think the EKS example uses the approach I mentioned initially: Add a second logical nodegroup in your Pulumi program, run pulumi up, migrate the workloads, remove the original logical nodegroup from your Pulumi program, run pulumi up again.

rapid-belgium-27383

08/07/2024, 2:29 PM

I though about that but what if I have several stacks (dev, qa, prod) and only want to upgrade the dev cluster ? If I change the program it will be available to each stacks, right ?

modern-zebra-45309

08/07/2024, 2:30 PM

Only if you actually deploy the stack with "pulumi up." You can evolve the program underlying each stack separately.

modern-zebra-45309

08/07/2024, 2:31 PM

So you could do the upgrade procedure one stack each at a time. But I agree that it's not pretty.

rapid-belgium-27383

08/07/2024, 2:32 PM

In fact I’d like to have a program (or YAML) and only manage the config of each stack so they use the same base program

modern-zebra-45309

08/07/2024, 2:33 PM

Yes, which is how it should be, but you're now mixing in operational deployment/maintenance concerns into your infrastructure code, which should be handled by the provider

modern-zebra-45309

08/07/2024, 2:35 PM

Ideally, your new nodepool would only show up as "up and ready" once it is actuallly available for scheduling workloads on it, and then you would gracefully terminate the old nodepool. But it sounds like the deletion of the old nodepool starts long before the new nodepool is actually ready from an application perspective.

rapid-belgium-27383

08/07/2024, 5:49 PM

In fact, as I saw this upgrade working fine in terraform for EKS I thought this could be easily replicated with Pulumi for Exoscale. But there might be an automated upgrade (taking into account the drain) which is not implemented in the cloud provider side I guess. Do you think each provider maintain its Pulumi library ?

modern-zebra-45309

08/07/2024, 5:56 PM

I have not seen the problem you face with EKS. I'm pretty sure (although I have not checked) that node group replacements work smoothly there, at least when everything is appropriately configured.

modern-zebra-45309

08/07/2024, 5:58 PM

Do you think each provider maintain its Pulumi library ?

I don't think so. A "provider" is a component in Pulumi (and Terraform) that handles the communication with the platform. So there's an AWS provider (several, actually), a Kubernetes provider...

modern-zebra-45309

08/07/2024, 5:59 PM

Someone actually filed an issue describing the problem you're facing (or at least a similar one) over a year ago: https://github.com/pulumiverse/pulumi-exoscale/issues/109

modern-zebra-45309

08/07/2024, 6:00 PM

The provider is based on the Terraform provider maintained by Exoscale: https://github.com/exoscale/terraform-provider-exoscale You could reach out for help there.

rapid-belgium-27383

08/07/2024, 6:09 PM

Thanks, you right I will check that on Exoscale side 👍 BTW, thanks a lot for you help, I understood many things in my Pulumi journey 😀

modern-zebra-45309

08/07/2024, 6:11 PM

You're welcome, I'm happy if I can help, it's usually a great learning opportunity for me as well 🙂

rapid-belgium-27383

08/08/2024, 8:40 AM

Hey @modern-zebra-45309 I think I will go with one of your recommendations, using a single project per environment. Thus my dev cluster will have its own project so I can easily create an additional node pool and drain the old one when it’s there (without impacting the other environment / stack). I wanted to kinda use the same program definition (yaml in my case) for each stack, but that will be too complicated for the upgrade process

👍 1

modern-zebra-45309

08/08/2024, 8:43 AM

Note that if you'd switch from YAML to a programming language like Python, you could re-use parts of your code between programs. You could have a joint cluster setup that you import into your environment-specific programs.

16 Views

Open in Slack

Previous Next

Pulumi Community

No matter how you like to participate in developer communities, Pulumi wants to meet you there. If you want to meet other Pulumi users to share use-cases and best practices, contribute code or documentation, see us at an event, or just tell a story about something cool you did with Pulumi, you are part of our community.