This message was deleted Pulumi Community #general

Join Slack

This message was deleted.

# general

sparse-intern-71089

12/17/2018, 5:44 PM

This message was deleted.

abundant-airplane-93796

12/17/2018, 6:05 PM

is the node count the only thing that's changed? I remember there being a lot of limitations on what can be changed in a cluster via the api/terraform, which I think is one of the reasons I always create the cluster and any node pools separately these days

faint-motherboard-95438

12/17/2018, 6:09 PM

@abundant-airplane-93796 Yes, I only wanted to add a node to the cluster

faint-motherboard-95438

12/17/2018, 6:10 PM

How are you doing it if not with the

gcp

package wrapper of the terraform provider ?

abundant-airplane-93796

12/17/2018, 6:12 PM

in truth I haven't done it via pulumi yet, but I did run into similar issues back when using terraform itself, and even the google deployment manager. might be unrelated, but what I was hitting was some limitation in how google's api allows changes to a cluster

faint-motherboard-95438

12/17/2018, 6:17 PM

sounds like related to me since it’s only a wrapper of the terraform provider under the hood. That’s kind of a problem if we are not able to scale up/down a cluster’s nodes even with native google’s api. I’ll need to dig on this one to be sure I won’t hit a wall later on.

abundant-airplane-93796

12/17/2018, 6:21 PM

are you creating the cluster + the node pool in a single resource, or creating a cluster resouce, and attaching nodepool resources to it?

abundant-airplane-93796

12/17/2018, 6:21 PM

with terraform at least I'm able to adjust scaling in the latter scenario

faint-motherboard-95438

12/17/2018, 6:25 PM

right now yes, I’m provisioning a

Cluster

resource (in pulumi terms) with all related configurations cluster and nodepool in it. Honestly I’m still new to this, so not sure if what I’m doing is the right way to do it. From what you are saying I understand I could create an “empty” cluster and attach nodepools to it later on ? Which could maybe be more flexible and avoid the error I’m encountering right now ?

abundant-airplane-93796

12/17/2018, 6:26 PM

yup, exactly that. similar to the "Usage with an empty default pool" example given in the terraform docs here https://www.terraform.io/docs/providers/google/r/container_node_pool.html

abundant-airplane-93796

12/17/2018, 6:27 PM

I also like that splitting out the nodepools makes it a little easier (IMHO) to add additional/replacement node pools with different config down the line

faint-motherboard-95438

12/17/2018, 6:33 PM

indeed, I was not aware of this capability, that will be very helpul. I will try to figure out how to configure this with the pulumi provider

faint-motherboard-95438

12/17/2018, 6:33 PM

thanks for the tip !

👍 1

orange-tailor-85423

12/17/2018, 10:51 PM

This is what we are doing in our code. No default node pool, build via separate function

glamorous-printer-66548

12/18/2018, 12:57 AM

@faint-motherboard-95438 we usually create clusters with https://www.terraform.io/docs/providers/google/r/container_cluster.html#remove_default_node_pool set to true (or the pulumi equivalent) and create separate pools then.

faint-motherboard-95438

12/18/2018, 9:05 AM

thanks @orange-tailor-85423 and @glamorous-printer-66548 I found out this option afterward and supposed that was the way to go but good to have both your confirmation on this matter 🙂 Indeed it works better this way.

faint-motherboard-95438

12/18/2018, 9:35 AM

Still one question though, how are you selecting on which node pool resources are deployed ?

faint-motherboard-95438

12/18/2018, 9:46 AM

ok I think I found how : https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

faint-motherboard-95438

12/19/2018, 2:14 PM

Hum, after a few tries, even using separated

NodePool

component from the main

Cluster

one does not work. If I try to add/change/remove a label, which is the simplest thing we can do to a node I guess, I get a 409 from google apis.

Copy code

error: Plan apply failed: error creating NodePool: googleapi: Error 409: Already exists: projects/REDACTED/zones/REDACTED/clusters/REDACTED/nodePools/test-pool., alreadyExists

Why is it trying to recreate it while it only has to change its labels ?

faint-motherboard-95438

12/19/2018, 2:18 PM

That is a big issue for me if I can’t update my nodes properties or scale them up or down at will

glamorous-printer-66548

12/19/2018, 2:19 PM

the GKE apis don’t allow changing labels of an existing nodepool, so either way (UI / terraform / pulumi) changing a label means delete + recreate. The problem you’re running into is probably https://github.com/pulumi/pulumi/issues/1620 . Basically currently for most resources, when pulumi determines it has to recreate a resource, it will first create the new resource before deleting the old one (that is done to reduce downtime). Since you probably specified a fixed

name

for your nodepool this strategy is running into a conflict. The suggested resolution for now is not to specify a

name

in the nodepool resource, but instead let pulumi auto-generate a name based on the pulumi resource id (for nodepools pulumi will generate a name using

<resource_id>-<some_random_suffix>

). This way pulumi is able to create a new nodepool before deleting the old one in the case of changes which require a recreation (e.g. changing nodepool labels).

faint-motherboard-95438

12/19/2018, 2:26 PM

@glamorous-printer-66548 oh ok, gotcha, thanks. Makes sense. So, sorry if that’s a dumb question, but if the provider deletes the node in order to make the changes, what happens to the services/pods in this node ? Will they be preserved/recreated along the way without any downtime or should I have to take care of this myself and if yes what is the right way to do it ?

glamorous-printer-66548

12/19/2018, 2:33 PM

well when GKE deletes a nodepool it first puts it ‘cordons’ / ‘drains’ it’s nodes by applying a taint to them, which prevents scheduling of further pods onto this node. After being a minute or so cordoned the node gets deleted altogether. In the meantime the scheduler might reschedule / move the pods onto the new nodepool. I honestly can’t tell you 100% sure whether there will be any downtime in between as the time it takes to reschedule your pods depends on a couple of factors e.g. how big your images are (the new node will have to redownload the images) but in most cases if there’s any downtime it should be fairly short (i.e. a couple of seconds). If no downtime, not even a few seconds is acceptable to you, you should probably test out the behaviour in detail with your apps.

glamorous-printer-66548

12/19/2018, 2:35 PM

I guess this guide might be helpful https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool

glamorous-printer-66548

12/19/2018, 2:38 PM

small extra info:when deleting a nodepool, GKE definately cordons the pool, but I’m not sure whether it also drains it.

faint-motherboard-95438

12/19/2018, 2:39 PM

Sounds crystal clear, thanks a lot for such explanation and for the link, I still need to learn a lot about all of this, but missing time. I’ll try that right away and see how it goes

faint-motherboard-95438

12/26/2018, 5:37 PM

@glamorous-printer-66548 sorry to bother you with that again, but it seems I still have an issue related to my cluster and nodes provisioning. I did as you suggested, I create an empty cluster, then I attach to it some

NodePool

without a

name

property so pulumi should generate one by itself. When I try to change something in a pool (the

nodeCount

for instance) now I get this :

Copy code

gcp:container:Cluster (REDACTED):
    error: Plan apply failed: 1 error occurred:

    * updating urn:pulumi:REDACTED::REDACTED::REDACTED:Cluster$gcp:container/cluster:Cluster::REDACTED: googleapi: Error 400: Node_pool_id must be specified., badRequest

I don’t know what is wrong, it should have its

Node_pool_id

in the state somewhere from the previous

pulumi up

, shouldn’t it ?

glamorous-printer-66548

12/26/2018, 5:40 PM

yeah it should. that’s weird - never seen this before even though I’ve already done lots of updates like changing node count. Can u share ur code, especially the cluster config? Looks like the error you’re getting is on the cluster resource and not on the pool resource which is kinda weird too.

faint-motherboard-95438

12/26/2018, 10:35 PM

I did a refresh (which updated a lot of things, that’s weird since I didn’t change anything in between) and after a new update the problem was gone. Really weird. I’ll try with a full new install to see if I can reproduce

faint-motherboard-95438

12/27/2018, 4:36 PM

@glamorous-printer-66548 after having destroyed and re-created everything I don’t have this issue anymore, not sure what I did wrong nor fixed in the meantime, but it’s gone. Thanks !

2 Views

Open in Slack

Previous Next