hey folks, have a little issue here. We use GKE (G...
# general
g
hey folks, have a little issue here. We use GKE (GCP + k8s). Now inspired by https://github.com/pulumi/examples/blob/master/gcp-ts-gke/cluster.ts#L27 I’ve written a utility funtion with this signature:
const getK8sProviderByClusterName: (clusterName: string) => k8s.Provider
. It basically creates a k8s provider simply by passing a clustername (of an existing cluster - it will not magically create a new one) to it (and project id and zone are taken from the gcp config). The cluster is typically not defined itself in the stacks which use that utility function, so in order to get the cluster I’m using
Cluster.get(...)
. The idea was to avoid having my team members to “correctly” configure their kubeconfig, set the right context etc. etc. but instead make the deployment target explicitly configured in code by using only the
clusterName
. The problem I’m suddenly facing is that when anything non-substantial in the cluster changes, pulumi also wants to recreate (replace) the k8s provider and ALL the resources that have been previously created by that provider. Non-substantial changes in my case for example is that one of the node-pools scaled down by 1 instance. Even though I don’t use the node pool count at all to generate the k8s provider, pulumi wants to recreate it from scratch. Any advise on how to avoid this kind of problem? From my understanding Pulumi tracks here the dependency between the cluster and the k8s provider and because the k8s provider depends on the cluster any cluster changes will recreate the k8s provider. Is there any way to disable this auto-magic dependency tracking in this instance?
here’s the full source code of my gke utilities fyi:
Copy code
import * as gcp from '@pulumi/gcp';
import * as pulumi from '@pulumi/pulumi';
import * as k8s from '@pulumi/kubernetes';
import { Cluster } from '@pulumi/gcp/container';
import { defaultZone, defaultProject } from './gcp-constants';
import { memoize } from 'ramda';
import { packageConfig } from './misc';

const gcpConfig = new pulumi.Config('gcp');

// most of the functions in this module are memoized. This is to improve lookup speed in bigger pulumi programs like solvvy-apis and
// to avoid resource id conflicts in pulumi when importing the same cluster / resource multiple times.

export const lookupClusterByName = memoize(
  (clusterName: string): Cluster => {
    return Cluster.get(clusterName, clusterName, {
      name: clusterName,
      zone: gcpConfig.get('zone') || defaultZone,
      project: gcpConfig.get('project') || defaultProject
    });
  }
);

export function kubectlConfigFromGkeCluster(cluster: Cluster) {
  return pulumi.all([cluster.name, cluster.endpoint, cluster.masterAuth]).apply(([name, endpoint, auth]) => {
    const context = `${gcp.config.project}_${gcp.config.zone}_${name}`;
    return `apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${auth.clusterCaCertificate}
    server: https://${endpoint}
  name: ${context}
contexts:
- context:
    cluster: ${context}
    user: ${context}
  name: ${context}
current-context: ${context}
kind: Config
preferences: {}
users:
- name: ${context}
  user:
    auth-provider:
      config:
        cmd-args: config config-helper --format=json
        cmd-path: gcloud
        expiry-key: '{.credential.token_expiry}'
        token-key: '{.credential.access_token}'
      name: gcp
`;
  });
}

export const getK8sProviderByClusterName: (clusterName: string) => k8s.Provider = memoize((clusterName: string) => {
  return new k8s.Provider(clusterName, {
    kubeconfig: kubectlConfigFromGkeCluster(lookupClusterByName(clusterName))
  });
});

export const getK8sProviderByCluster = memoize((providerInstanceName: string, cluster: Cluster) => {
  return new k8s.Provider(providerInstanceName, {
    kubeconfig: kubectlConfigFromGkeCluster(cluster)
  });
});

export const getK8sProviderFromInferredCluster = memoize(() => {
  const clusterName = packageConfig.require('cluster');
  return new k8s.Provider(clusterName, {
    kubeconfig: kubectlConfigFromGkeCluster(lookupClusterByName(clusterName))
  });
});
@creamy-potato-29402 ?
c
ooo sorry I haven’t read all this yet. one sec
@glamorous-printer-66548 oh wow that sucks. I believe this is a problem with the TF underbelly of the GCP implementation. @white-balloon-205 @microscopic-florist-22719, what do you think?
The semantics of this part of the stack are a bit of a mystery to me….
m
Cluster changes shouldn’t require replacing the k8s provider—this is actually pretty surprising. However, this is what would happen if for some reason we felt that the provider did need to be replaced. I’ll be on a flight for the next while—@white-balloon-205 or @incalculable-sundown-82514, can one of you take a glance at this?
w
Yeah - I'm looking now. Going to open an issue to track this and we can move investigation there.
g
are you sure this a GCP provider specific problem? I thought it might be more of a general problem how resource changes / dependencies are tracked and propagated and how resource imports work. Generally speaking what I’m trying to do is just to use the existing GKE cluster as a tf-like-data-source and use a small subset of it’s values to create a k8s provider. I know that I could technically use
getCluster
instead of
Cluster.get(...)
to get the values I need, but I opted to use Cluster.get because it’s API is nicer. Now the thing is one big difference between Cluster.get and getCluster seems to be that in the former pulumi keeps the output of the previous .get in the state, whereas with getCluster it does not. In conclusion I think if I would have used getCluster I wouldn’t have seen this surprising behaviour. but I’m still wondering why there’s the existence of two potentially overlapping functions
Cluster.get
and
getCluster
and why they must behave so differently.
c
@glamorous-printer-66548 I was proposing it’s a problem with the underlying TF provider for GCP. I could also be wrong, my believability on this is not nearly as high as the people I’ve tagged in.
I’m do not believe this is a problem with GCP itself.
As for the other stuff you mention, it’s very interesting context, thanks.
w
This is related to the way we handle 1st class provider (
new ks8.Provider
) in the Pulumi engine. We believe that in general you shouldn't see this behaviour - but we'll need to reproduce to understand exactly what is triggering this in the specific case here.
g
So in addition to my concrete gke / k8s specific problem, I want to raise again the question: What are the main differences between
get<Resource>(...)
and
Resource.get(...)
and why do those two overlapping methods with different underlying behaviour exist? What was the idea / philosophy for having this distinction? We discussed this to some extent here https://pulumi-community.slack.com/archives/C84L4E3N1/p1538172290000200 having now faced the gke / k8s problem I’m still confused about the intention.
w
The
getXYZ
functions are derived from a Terraform concept of "data sources". They are 1:1 with a Terraform data source. They do not return full fledged Pulumi resources, they are just code that runs to make certain API calls against the target cloud. The
Resource.get()
methods are a Pulumi concept that lets you read in the state of a resource and persist it to the checkpoint. They return true Pulumi resources that can be used like any other. We are expecting these to be used as a key path to how resources can be adopted into a Pulumi program. For now, the former is generally more "battle tested" for the cases where it exists.
g
I see. I’m wondering though why the state of a resource that is read via
Resource.get(...)
is persisted to the checkpoint? The resource is anyways not under management of pulumi, so what’s the point?
i
The resource is anyways not under management of pulumi, so what’s the point?
Pulumi keeps track of the full resource dependency graph generated by your program. Since you can use output properties on a resource that you got via
Resource.get
, Pulumi must record the dependency between the
.get
resource and a non-
.get
resource created using a
.get
resource’s outputs.
To do so, the
.get
resource must be in the checkpoint.
g
Yeah I guess what I’m questioning is whether a
.get
resource that is not under management should be dependency tracked in the same full fledged fashion as a regular source at all? In my case the GKE cluster is actually under management in another project A and I just need to use the
clusterCaCertificate
and
endpoint
properties of that cluster in project B. So it’d like to establish a dependency on those two properties only and not the whole cluster resource as such.