This message was deleted Pulumi Community #kubernetes

Join Slack

This message was deleted.

# kubernetes

sparse-intern-71089

02/15/2023, 1:52 PM

This message was deleted.

billowy-army-68599

02/15/2023, 2:07 PM

This is a network issue really, has anything changed on your network?

dazzling-oxygen-84405

02/15/2023, 2:12 PM

No, this has always been running in Github Actions, and it has never had access to the cluster. In hindsight, I'm somewhat surprised it worked in the past, but it definitely did. My main suspicion is that something is causing Pulumi to try to refresh resources that were previously just using the saved state?

steep-toddler-94095

02/15/2023, 5:06 PM

This sounds like a network issue to me too. You can confirm it if you just run a plain kubectl command against the cluster in a github action.

steep-toddler-94095

02/15/2023, 5:10 PM

nc -v 172.18.3.244 6443

172 is a private ip space so your runners either need to exist in the same VPC or something like a VPN. If you don't have that type of setup, it won't work and wouldn't have ever worked. I would double check if things changed about your network, agents, or cluster recently. Even things like if the cluster used to have a publicly accessible API server

dazzling-oxygen-84405

02/15/2023, 7:17 PM

If you don't have that type of setup, it won't work and wouldn't have ever worked..

Exactly, I'm 99% certain that github actions never had access to the Kubernetes API. The fact remains that this did work, for several months, until a week or two ago. It's a private repo, so I can't share a link to the action run/PR, but it absolutely produced previews of changes to this cluster. The only possible explanation is that

pulumi preview

was previously not contacting the API server at all, and just diffing the saved state with what running the stack gives (which, unexpectedly, seems to have been possible without contacting the API server). And recently, something changed which caused a preview to require the API server. The long-term solution is to set up a VPN for our runners, and I'm working on this, I was just curious if anyone was aware of something that changed on the Pulumi side that now requires access to the API server.

gorgeous-minister-41131

02/15/2023, 7:21 PM

Did enableServersideApply become default in the k8s provider? I'm wondering if that has anything to do with it. Also, do you have something defined in the

.kubeconfig

somewhere?

gorgeous-minister-41131

02/15/2023, 7:23 PM

in your CI job, when you run

kubectl config current-context

, is there a context defined? if so, drill into that and see if it's defining a server that matches that host?

steep-toddler-94095

02/15/2023, 7:23 PM

and/or are you possibly using the --refresh flag?

dazzling-oxygen-84405

02/15/2023, 7:25 PM

Hm, the kubeconfig hasn't changed - it's generated as an output of another stack, and passed explicitly to the provider via the

kubeconfig

option. I'm not setting the

--refresh

flag in CI. I tried setting it to

false

locally (looks like

true

is the default) and running it while disconnected from the VPN, but this shows the same error.

gorgeous-minister-41131

02/15/2023, 7:27 PM

Well, generally speaking and considering serverSideAPply is going to become the provider default in the future, not having access to a k8s cluster, even a mock one, is going to make the provider preview relatively useless going forward... I'm not sure what your specific set up looks like, but Pulumi is trying to reach out to a cluster to diff the resource changes.

gorgeous-minister-41131

02/15/2023, 7:28 PM

are you using an explicit provider with ResourceOptions?

gorgeous-minister-41131

02/15/2023, 7:30 PM

Did someone else operate on this state at one point and now the cluster was persisted to it? e.g. someone ran a

pulumi update

with live cluster access, and the CI is trying to use the same state?

dazzling-oxygen-84405

02/15/2023, 7:30 PM

considering serverSideAPply is going to become the provider default in the future, not having access to a k8s cluster, even a mock one, is going to make the provider preview relatively useless going forward

Fair enough. I'm really looking forward to the patch functionality - that'll make some parts of our stack a little tidier. I'll get working on the VPN setup. That's definitely the way forward, I was just hoping for a flag I could toggle to work around this for a day or two. But no worries if not.

are you using an explicit provider with ResourceOptions?

Yes, I'm only setting

kubeconfig

to the provider.

gorgeous-minister-41131

02/15/2023, 7:31 PM

Something smells like someone ran a pulumi update against this stack/state while connected to this "VPN", and it got stored in the state as a refreshable resource, but it's not accessible from CI

gorgeous-minister-41131

02/15/2023, 7:31 PM

perhaps the context names are the same, but the endpoints differ

dazzling-oxygen-84405

02/15/2023, 7:33 PM

someone ran a
pulumi update
with live cluster access, and the CI is trying to use the same state?

Yes, our normal workflow is: 1. Run

pulumi preview

from CI (this is the thing that's no longer working) 2. If the diff looks good, merge the PR 3. Run

pulumi update

from a machine that has access to the VPN This has been working for some time, nothing about this has changed recently.

gorgeous-minister-41131

02/15/2023, 7:33 PM

No, but perhaps Pulumi is now storing some additional metadata about this and future runs are detecting it.

gorgeous-minister-41131

02/15/2023, 7:34 PM

I can only assume that your actual cluster _is located at

172.18.3.244

✅ 1

dazzling-oxygen-84405

02/15/2023, 7:34 PM

Yep, that is the correct IP address for the control plane.

dazzling-oxygen-84405

02/15/2023, 7:36 PM

Anyway, since server-side-apply is the way forward, and there's nothing obvious that can be done to work around in the short term, I'll proceed with enabling the VPN for our CI runners.

steep-toddler-94095

02/15/2023, 7:42 PM

Not sure if this is an option for you, but you could also consider creating runners in the same network space as your K8s cluster (you could even create them inside the k8s cluster with actions-runner-controller)

👀 1

gorgeous-minister-41131

02/15/2023, 7:43 PM

we're using actions-runner-controller^

gorgeous-minister-41131

02/15/2023, 7:50 PM

Are you using any Helm Chart()/Release() resources? wonder if somehow Helm is invoking the live connection to check for something

gorgeous-minister-41131

02/15/2023, 7:52 PM

I'm curious what the contents of the

kubeconfig

param you're passing in looks like, or if it is too sensitive to share

gorgeous-minister-41131

02/15/2023, 7:53 PM

https://github.com/pulumi/pulumi-kubernetes/pull/1886

gorgeous-minister-41131

02/15/2023, 7:53 PM

smells a bit like this

gorgeous-minister-41131

02/15/2023, 7:54 PM

https://github.com/pulumi/pulumi-kubernetes/issues/1794

gorgeous-minister-41131

02/15/2023, 7:54 PM

this too

gorgeous-minister-41131

02/15/2023, 7:55 PM

despite dry run being enabled, the

Release()

resource still tries to reach the cluster, live.. and all Release does, AFAIK, is just leverage the built-in Helm v3 bindings.. perhaps these have changed their behavior as well.

dazzling-oxygen-84405

02/15/2023, 7:56 PM

I'm using helm

Release

resources, but they haven't changed recently. The

kubeconfig

is below (secrets snipped) - afaik nothing unusual about it.

Copy code

{
  "apiVersion": "v1",
  "kind": "Config",
  "clusters": [
    {
      "name": "on-prem",
      "cluster": {
        "certificate-authority-data": "...secret...",
        "server": "<https://172.18.3.244:6443>"
      }
    }
  ],
  "users": [
    {
      "name": "pulumi",
      "user": {
        "client-certificate-data": "...secret...",
        "client-key-data": "...secret..."
      }
    }
  ],
  "contexts": [
    {
      "name": "pulumi@on-prem",
      "context": {
        "cluster": "on-prem",
        "user": "pulumi"
      }
    }
  ],
  "current-context": "pulumi@on-prem"
}

dazzling-oxygen-84405

02/15/2023, 7:56 PM

We haven't set up

actions-runner-controller

yet, but I'll add this to the list of advantages it has 🙂

gorgeous-minister-41131

02/15/2023, 7:57 PM

so the server is being defined in the config, that would make sense it's going to try and compare the Release meta data against it. Not sure what process you use to generate that kubeconfig payload, but perhaps server was null before or empty string

gorgeous-minister-41131

02/15/2023, 8:01 PM

https://github.com/pulumi/pulumi-kubernetes/issues/2311

gorgeous-minister-41131

02/15/2023, 8:02 PM

https://github.com/pulumi/pulumi-kubernetes/blob/17105eac6ed820780c8f8fb2ddde8c60ba004198/provider/pkg/provider/provider.go#L1908

steep-toddler-94095

02/15/2023, 8:03 PM

ahh don't use that env var for this situation. it will remove the resources it can't reach from state, meaning all of your k8s resources

gorgeous-minister-41131

02/15/2023, 8:04 PM

Hmm yeah maybe that's not what we want,.. interesting though that this all of a sudden is cropping up for them

gorgeous-minister-41131

02/15/2023, 8:05 PM

Copy code

Previewing update (prd-bravo):
  pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:prd-bravo::k8s-aws-auth-config::pulumi:pulumi:Stack::k8s-aws-auth-config-prd-bravo]
    > kubernetes:core/v1:Namespace: (read)
        [id=kube-system]
        [urn=urn:pulumi:prd-bravo::k8s-aws-auth-config::kubernetes:core/v1:Namespace::kube-system]
        [provider=urn:pulumi:prd-bravo::k8s-aws-auth-config::pulumi:providers:kubernetes::eks_prd-bravo::8a15f8ed-f5ed-4537-a7bf-41d1aed334b8]
warning: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: Get "https://<REDACTED>.sk1.us-gov-west-1.eks.amazonaws.com1/openapi/v2?timeout=32s": dial tcp: lookup <REDACTED>.sk1.us-gov-west-1.eks.amazonaws.com1 on 10.0.0.2:53: no such host
error: Preview failed: failed to read resource state due to unreachable cluster. If the cluster has been deleted, you can edit the pulumi state to remove this resource or retry with the PULUMI_K8S_DELETE_UNREACHABLE environment variable set to true.
error: preview failed
Resources:
    2 unchanged
warning: A new version of Pulumi is available. To upgrade from version '3.53.0' to '3.55.0', visit <https://p>

gorgeous-minister-41131

02/15/2023, 8:06 PM

I only mentioned it since I was able to repro by using a bogus hostname in my kubeconfig for this named context

👍 1

gorgeous-minister-41131

02/15/2023, 8:08 PM

and if I blank it out I definitely get a different error, as expected:

Copy code

warning: configured Kubernetes cluster is unreachable: unable to load Kubernetes client configuration from kubeconfig file. Make sure you have: 

         • set up the provider as per <https://www.pulumi.com/registry/packages/kubernetes/installation-configuration/>

gorgeous-minister-41131

02/15/2023, 8:14 PM

Whatever is generating

server

in the context is definitely injecting private IPs. You mentioned this value was coming from an output of another thing, perhaps another Pulumi state/project that is exporting the API server endpoint? As Mike indicated, did someone make this endpoint private [which is a good idea of course], and now the output contains this IP instead of the publicly accessible hostname that 'worked' before? Do you use EKS?

dazzling-oxygen-84405

02/15/2023, 8:15 PM

No, that's definitely the same IP it's always used. It's a bare-metal cluster running in an on-prem DC.

gorgeous-minister-41131

02/15/2023, 8:18 PM

Very odd -- I mean nothing stands out as obvious since the code around

clusterUnreachable

hasn't changed in 2-4 years, and the only recent change made to that provider for that exception is the mention of the env var I put above. 🤷

🤷 1

gorgeous-minister-41131

02/15/2023, 8:20 PM

Copy code

// We use this information to read the live version of a Kubernetes resource. This is sometimes
	// then checkpointed (e.g., in the case of `refresh`). Specifically:
	//
	// * The return is formatted as a "checkpoint object", i.e., an object of the form
	//   {inputs: {...}, live: {...}}. This is important both for `Diff` and for `Update`. See
	//   comments in those methods for details.
	//

gorgeous-minister-41131

02/15/2023, 8:20 PM

only thing I can think of is something did an implicit refresh which invalidated all of the checkpoints and that's what it had been relying on in the past.

565 Views

Open in Slack

Previous Next