Hey everyone I am using Python to provision a kub...
# general
e
Hey everyone I am using Python to provision a kubernetes cluster on digitalocean, install ingress-nginx on it, and then point dns to its loadbalancer ip via aws route53. When installing ingress-nginx on the cluster, we also create a kubernetes service of type
LoadBalancer
, after which the cloud provider automatically provisions a digitalocean load balancer for the cluster. Unfortunately this is is outside of Pulumi’s control, so I wrote a snippet which fetches the load balancer id from the service’s annotation, queries the digitalocean api for the load balancer and outputs its IP. The issue is that once the service of type
LoadBalancer
is created and the load balancer is deployed, it takes up to 3~ minutes for it to be assigned an IP. I need that IP for subsequent steps in the same script, so I have been trying to find ways to get Pulumi to wait for the IP to be available. Here is the code i’ve got so far (suggested by several iterations with Pulumi AI):
Copy code
# Get an existing Kubernetes Service
opts_existing_service = pulumi.ResourceOptions(
    depends_on=[cluster, kubernetes_provider, ingress_nginx_workload],
    provider=kubernetes_provider,
)
existing_service = Service.get(
    "existing-service", 
    f"ingress-nginx/ingress-nginx-controller", 
    opts=opts_existing_service,
)
# Read the load balancer ID
load_balancer_id = existing_service.metadata["annotations"]["<http://kubernetes.digitalocean.com/load-balancer-id|kubernetes.digitalocean.com/load-balancer-id>"]

# Grab the DO load balancer
load_balancer = digitalocean.LoadBalancer.get("ingress-nginx-load-balancer", id=load_balancer_id)

# Grab the IP from the DO load balancer
load_balancer_ip = pulumi.Output.from_input(load_balancer.ip).apply(lambda ip: ip if ip else 'IP not assigned yet')

pulumi.export("load_balancer_id", load_balancer_id)
pulumi.export("load_balancer_ip", load_balancer_ip)
What I get from this is always an output saying
load_balancer_ip: "IP not assigned yet"
, because the Pulumi script finishes running before the digitalocean load balancer has an ip address. I understand that using
time.sleep
or
while
loops to continuously poll for the IP is not best practice, plus that didn’t really work for me due to the asynchronous nature of the program. What is the correct way for me to wait for the load balancer IP to be available and continue the program, while using Pulumi tools?
For the record, here we are trying to migrate from a terragrunt/terraform project to Pulumi. In the tf project we achieved the above with the following module:
Copy code
provider "digitalocean" {}

resource "null_resource" "previous" {}

// Digitalocean load balancer is not managed by terraform.
// We wait for the loadbalancer to be ready so we can read its ip and output it.
resource "time_sleep" "wait_for_load_balancer" {
  depends_on = [null_resource.previous]

  create_duration = "180s"
}

data "digitalocean_loadbalancer" "loadbalancer" {
  depends_on = [time_sleep.wait_for_load_balancer]

  id = var.id
}
With this I get a data object with the load balancer which can output its IP and other stuff. Of course the entire script pauses for 3 minutes until the IP is available.
Apologies for the wall of text 😅
d
I'm not sure it works with
.get()
, but if you access the ip address via the status of the service, it should wait for you. Example of what the status object looks like: https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer
Something like
lb_ip = existing_service.status.loadBalancer.ingress[0].ip
Ah, seems it won't work: https://github.com/pulumi/pulumi-kubernetes/blob/master/provider/pkg/await/await.go#L301 Would be a good feature request as this will happen whenever using helm.Release. Related bug report: https://github.com/pulumi/pulumi-kubernetes/issues/1915
Interestingly,
sleep
is missing from the pulumi time provider wrapper, so you can't replicate your TF directly: https://www.pulumi.com/registry/packages/time/ This should also have an issue raised
e
Hey Anthony thanks. This is also not viable for my specific use case because when installing the ingress-nginx i also set an annotation on the service
"<http://service.beta.kubernetes.io/do-loadbalancer-hostname|service.beta.kubernetes.io/do-loadbalancer-hostname>"
with the final DNS hostname there. So the status of the service will never show me the ip, it will show me the host name i’ve set up before
Interestingly,
sleep
is missing from the pulumi time provider wrapper, so you can’t replicate your TF directly
Yep, sleep is missing altogether, but I seem to understand it’s because we should not be relying on sleep but on other more asynchronous friendly approaches. Problem is I am missing something here and I cannot really understand what 😅 I am also pretty sure I am not the first one with this problem…
d
You could do a
local.Command
to run sleep yourself, and have the DO LB resource depend on it
Re hostname, this is a separate field on the status, so it should have both an ip + hostname. However I'm not familiar with how the DO LB controller works in this regard. Sounds buggy if it's not set imo. Update: explanation of how it's a workaround, bot a bug: https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
Should not rely on sleep
The real world is rarely so kind as to allow the ideal 🤣
When doing your
.get()
, you can pass in
opts=pulumi.ResourceOptions(depends_on=[sleeper])
Where sleeper is this resource: https://www.pulumi.com/registry/packages/command/api-docs/local/command/
e
Sounds hacky, but will give it a try 😂 Thank you for helping, @dry-keyboard-94795
d
Is the Ingress service setup with helm before pulumi is run? Could do a kubectl wait before running pulumi
e
no the ingress nginx is setup in the same script in the step before.
Copy code
# Prepare the kubernetes provider
kubeconfig = cluster.kube_configs[0].raw_config
kubernetes_provider = k8s_provider("kubernetes_provider", kubeconfig=kubeconfig)

opts = pulumi.ResourceOptions(
    depends_on=[cluster, kubernetes_provider],
    provider=kubernetes_provider,
)


def set_load_balancer_annotation(obj, opts):
    """If the kubernetes object is of type LoadBalancer, adds an annotation to it"""
    if obj["kind"] == "Service" and obj["apiVersion"] == "v1":
        try:
            t = obj["spec"]["type"]
            if t == "LoadBalancer":
                obj["metadata"]["annotations"][
                    "<http://service.beta.kubernetes.io/do-loadbalancer-hostname|service.beta.kubernetes.io/do-loadbalancer-hostname>"
                ] = DUMMY_SERVICE_URL
        except KeyError:
            pass

# ingress nginx controller
ingress_nginx_workload = ConfigFile(
    "ingress-nginx",
    opts=opts,
    skip_await=True,
    transformations=[set_load_balancer_annotation],
    file="kubernetes_manifests/ingress-nginx/v1.5.1/deploy.yaml",
)
d
How come you need skip_await?
e
Great question. Didn’t give that one a lot of thought until now. I can probably just omit that param and wait for the resources to be ready. Unsure if this will help with the original issue though
d
If you remove it, pulumi has await logic to wait for LBs to have their ingress set
There's also
ConfigFile().get_resource()
, which let's you access the service directly. So
existing_service = ingress_nginx_workload.get_resource("core/v1/Service", "ingress-nginx-controller", namespace="ingress-nginx")
e
I now remember why i used the
skip_await
. at this stage, when installing the nginx controller, the deployment times out because it waits for application pods to actually send traffic to. The application workloads are however installed on the cluster at a later step and outside of Pulumi, with ArgoCD.
d
Skip_await sets a transformation to apply the annotation: https://github.com/pulumi/pulumi-kubernetes/blob/master/sdk/python/pulumi_kubernetes/yaml/yaml.py#L385 Something you can try is have the annotation on everything except the Service object (or only on the failing Deployment), and remove
skip_await
Actually that might not work, as it looks like the await checks if the service has associated pods too. Depends if it wants the pods to be healthy or not. Guess you'll need the sleep hack after all
I've opened an issue regarding time_sleep being missing for you: https://github.com/pulumiverse/pulumi-time/issues/36
e
I’ve managed to get the sleeper solution to work 🙂 I’ve run the whole script end to end and the output is working as required :D
Copy code
# Get an existing Kubernetes Service
opts_existing_service = pulumi.ResourceOptions(
    depends_on=[cluster, kubernetes_provider, ingress_nginx_workload],
    provider=kubernetes_provider,
)
existing_service = Service.get(
    "existing-service", 
    f"ingress-nginx/ingress-nginx-controller", 
    opts=opts_existing_service,
)

# Read the load balancer ID
load_balancer_id = existing_service.metadata["annotations"]["<http://kubernetes.digitalocean.com/load-balancer-id|kubernetes.digitalocean.com/load-balancer-id>"]

wait_cmd = command.local.Command("wait_cmd",
    create="sleep 180",
    delete="true", 
    opts=pulumi.ResourceOptions(depends_on=[existing_service]) 
)

# Grab the DO load balancer
load_balancer = digitalocean.LoadBalancer.get(
    "ingress-nginx-load-balancer", 
    id=load_balancer_id, 
    opts=pulumi.ResourceOptions(depends_on=[wait_cmd])
)

# Grab the IP from the DO load balancer
load_balancer_ip = pulumi.Output.from_input(load_balancer.ip).apply(lambda ip: ip if ip else 'IP not assigned yet')

pulumi.export("load_balancer_id", load_balancer_id)
pulumi.export("load_balancer_ip", load_balancer_ip)
I’d prefer using a pulumi provider for sleep like Terraform has, but this also works. Thanks again, @dry-keyboard-94795 🤝
d
Nice. Hopefully it'll get added to the provider soon. I think removing skipAwait from the service should work too, but it's sometimes better knowing you have less edge cases with the sleeper
r
this works better than i expected 🙂 im using it to create transit gateways in AWS cause pulumi doesn’t wait for them to be ready and works really well. Thanks!
d
@rhythmic-secretary-1287 can you share an example of your usage? This may need raising as an issue to the provider
r
when you create a transit gateway it gets created in a “pending” state. On pulumi perspective it is created but if you try to attach a VPC to it, it fails because the transit gateway is not ready
e
Yeah, been there too @rhythmic-secretary-1287, but with terraform. Same case applies,
time_sleep
, at least that’s how I dealt with TGW in the past. Im surprised a provider for waiting doesn’t exist out of the box for Pulumi.
r
me too! although i agree that is better to fix the provider 🙂 this is what i hit https://github.com/hashicorp/terraform-provider-aws/issues/21255
b
@eager-coat-38141 can you save me a few minutes and post your python code for creating your Kubernetes cluster and your loadbalancer here? there is a much more Pulumi-esque way to do this I can show you 🙂
e
Yup, here it is. As I said earlier, the loadbalancer is automatically created when we apply the manifests in line 59. The manifest is the same as in here
d
@eager-coat-38141 it's worth adding comments to the workarounds done, to remind your future self (skip_await, sleeper). It also helps others review, both for intent and for flagging non-normal patterns 😉
b
d
@billowy-army-68599 in your example, is the provider continuously querying the Service to get status updates?
b
No, I just realised I probably need to use the kubernetes client. I’ll fix it up shortly
d
I think with their case (using
ConfigFile
) doing a selective skipAwait on the Deployment so that the Service will await its endpoints + status is the way to go, anyway. Unsure if it'll work with the pods failing healthchecks though. @eager-coat-38141, up for trying some code if I refactor yours?
Untitled.py
e
Hey @dry-keyboard-94795 🙂 trying this out. It seems the services
ingress-nginx/ingress-nginx-controller
and
ingress-nginx/ingress-nginx-controller-admission
are the ones waiting for pods.
Copy code
+      └─ kubernetes:core/v1:Service         ingress-nginx/ingress-nginx-controller  creating (416s)...  [1/3] Finding Pods to direct traffic to
...
It feels like adding the skip await annotation to them will still cause the original issue with the Load balancer IP though. In any case I tried that and so far bumped into another issue where one of the ingress-nginx jobs is unable to complete. Kind of feel like the sleep command resource solution yielded the most results so far 😅 I will also try out @billowy-army-68599’s solution and see where I end up
d
@eager-coat-38141 the Sleep resource has now been added to the time provider as of
v0.0.16