<@UB3BGTV63> It seems like the `gcp` `Cluster` res...
# general
f
@white-balloon-205 It seems like the
gcp
Cluster
resource is unable to scale up an already existing cluster by changing its
nodeCount
or maybe I’m missing something ?
Copy code
Type                                 Name                                       Status                   Info
     pulumi:pulumi:Stack                  REDACTED
     └─ REDACTED:Cluster  REDACTED-cluster
 +-     └─ gcp:container:Cluster          REDACTED                   **replacing failed**    [diff: ~initialNodeCount]; 1 error

Diagnostics:
  gcp:container:Cluster (REDACTED):
    error: Plan apply failed: googleapi: Error 409: Already exists: projects/REDACTED/zones/REDACTED/clusters/REDACTED-cluster., alreadyExists
a
is the node count the only thing that's changed? I remember there being a lot of limitations on what can be changed in a cluster via the api/terraform, which I think is one of the reasons I always create the cluster and any node pools separately these days
f
@abundant-airplane-93796 Yes, I only wanted to add a node to the cluster
How are you doing it if not with the
gcp
package wrapper of the terraform provider ?
a
in truth I haven't done it via pulumi yet, but I did run into similar issues back when using terraform itself, and even the google deployment manager. might be unrelated, but what I was hitting was some limitation in how google's api allows changes to a cluster
f
sounds like related to me since it’s only a wrapper of the terraform provider under the hood. That’s kind of a problem if we are not able to scale up/down a cluster’s nodes even with native google’s api. I’ll need to dig on this one to be sure I won’t hit a wall later on.
a
are you creating the cluster + the node pool in a single resource, or creating a cluster resouce, and attaching nodepool resources to it?
with terraform at least I'm able to adjust scaling in the latter scenario
f
right now yes, I’m provisioning a
Cluster
resource (in pulumi terms) with all related configurations cluster and nodepool in it. Honestly I’m still new to this, so not sure if what I’m doing is the right way to do it. From what you are saying I understand I could create an “empty” cluster and attach nodepools to it later on ? Which could maybe be more flexible and avoid the error I’m encountering right now ?
a
yup, exactly that. similar to the "Usage with an empty default pool" example given in the terraform docs here https://www.terraform.io/docs/providers/google/r/container_node_pool.html
I also like that splitting out the nodepools makes it a little easier (IMHO) to add additional/replacement node pools with different config down the line
f
indeed, I was not aware of this capability, that will be very helpul. I will try to figure out how to configure this with the pulumi provider
thanks for the tip !
👍 1
o
This is what we are doing in our code. No default node pool, build via separate function
g
@faint-motherboard-95438 we usually create clusters with https://www.terraform.io/docs/providers/google/r/container_cluster.html#remove_default_node_pool set to true (or the pulumi equivalent) and create separate pools then.
f
thanks @orange-tailor-85423 and @glamorous-printer-66548 I found out this option afterward and supposed that was the way to go but good to have both your confirmation on this matter 🙂 Indeed it works better this way.
Still one question though, how are you selecting on which node pool resources are deployed ?
Hum, after a few tries, even using separated
NodePool
component from the main
Cluster
one does not work. If I try to add/change/remove a label, which is the simplest thing we can do to a node I guess, I get a 409 from google apis.
Copy code
error: Plan apply failed: error creating NodePool: googleapi: Error 409: Already exists: projects/REDACTED/zones/REDACTED/clusters/REDACTED/nodePools/test-pool., alreadyExists
Why is it trying to recreate it while it only has to change its labels ?
That is a big issue for me if I can’t update my nodes properties or scale them up or down at will
g
the GKE apis don’t allow changing labels of an existing nodepool, so either way (UI / terraform / pulumi) changing a label means delete + recreate. The problem you’re running into is probably https://github.com/pulumi/pulumi/issues/1620 . Basically currently for most resources, when pulumi determines it has to recreate a resource, it will first create the new resource before deleting the old one (that is done to reduce downtime). Since you probably specified a fixed
name
for your nodepool this strategy is running into a conflict. The suggested resolution for now is not to specify a
name
in the nodepool resource, but instead let pulumi auto-generate a name based on the pulumi resource id (for nodepools pulumi will generate a name using
<resource_id>-<some_random_suffix>
). This way pulumi is able to create a new nodepool before deleting the old one in the case of changes which require a recreation (e.g. changing nodepool labels).
f
@glamorous-printer-66548 oh ok, gotcha, thanks. Makes sense. So, sorry if that’s a dumb question, but if the provider deletes the node in order to make the changes, what happens to the services/pods in this node ? Will they be preserved/recreated along the way without any downtime or should I have to take care of this myself and if yes what is the right way to do it ?
g
well when GKE deletes a nodepool it first puts it ‘cordons’ / ‘drains’ it’s nodes by applying a taint to them, which prevents scheduling of further pods onto this node. After being a minute or so cordoned the node gets deleted altogether. In the meantime the scheduler might reschedule / move the pods onto the new nodepool. I honestly can’t tell you 100% sure whether there will be any downtime in between as the time it takes to reschedule your pods depends on a couple of factors e.g. how big your images are (the new node will have to redownload the images) but in most cases if there’s any downtime it should be fairly short (i.e. a couple of seconds). If no downtime, not even a few seconds is acceptable to you, you should probably test out the behaviour in detail with your apps.
small extra info:when deleting a nodepool, GKE definately cordons the pool, but I’m not sure whether it also drains it.
f
Sounds crystal clear, thanks a lot for such explanation and for the link, I still need to learn a lot about all of this, but missing time. I’ll try that right away and see how it goes
@glamorous-printer-66548 sorry to bother you with that again, but it seems I still have an issue related to my cluster and nodes provisioning. I did as you suggested, I create an empty cluster, then I attach to it some
NodePool
without a
name
property so pulumi should generate one by itself. When I try to change something in a pool (the
nodeCount
for instance) now I get this :
Copy code
gcp:container:Cluster (REDACTED):
    error: Plan apply failed: 1 error occurred:

    * updating urn:pulumi:REDACTED::REDACTED::REDACTED:Cluster$gcp:container/cluster:Cluster::REDACTED: googleapi: Error 400: Node_pool_id must be specified., badRequest
I don’t know what is wrong, it should have its
Node_pool_id
in the state somewhere from the previous
pulumi up
, shouldn’t it ?
g
yeah it should. that’s weird - never seen this before even though I’ve already done lots of updates like changing node count. Can u share ur code, especially the cluster config? Looks like the error you’re getting is on the cluster resource and not on the pool resource which is kinda weird too.
f
I did a refresh (which updated a lot of things, that’s weird since I didn’t change anything in between) and after a new update the problem was gone. Really weird. I’ll try with a full new install to see if I can reproduce
@glamorous-printer-66548 after having destroyed and re-created everything I don’t have this issue anymore, not sure what I did wrong nor fixed in the meantime, but it’s gone. Thanks !