Hey all! Is anyone aware of a way to set a delay t...
# general
s
Hey all! Is anyone aware of a way to set a delay time for creation/update/deletion of a resource? Use case: K8s cluster with the external-dns (ED) and aws-load-balancer-controller (ALB) Helm chart installed. The ED chart tracks your ingresses/services and will create DNS records for them. The ALB chart will create loadbalancers for them. (I don’t think the ALB chart is relevant to this question, but noting it just in case it may be). If i decide to tear down the ingresses and ED helm chart, it seems that there can be a race condition between the ED chart being deleted and it deleting any records it created for the ingresses. I have tried setting
dependsOn
on the ingresses to be be
[ALB, ED]
so that pulumi understands that the ingresses have a dependency on these 2 Helm charts. It doesn’t seem to completely have helped. A delay option would be nice, because then the 2 Helm charts can delete the resources within the given delay time, and only then be removed from the cluster.
f
I've had a lot of issues with k8s Ingress with pulumi. That's because the k8s package is not awaiting the underlying resource creation. For example for aws-load-balancer-controller that would be your ALB. When creating the controller, pulumi doesn't await the ALB creation. That meant that I couldn't get the hostname, for example, of an ingress resource. It also created a bunch of issues destroying the stack after because the stack is not aware there is another underlying resource using it. I have ditched the aws-load-balancer-controller because of this issue and instead started using traefik with custom resource definition (which gets corretly awaited). For reference: https://github.com/pulumi/pulumi-kubernetes/issues/1649 The issue has been added to a milestone so i'm guessing is on the roadmap to be fixed 🤷
s
Yes i’ve experience the same issues. Thanks for the tip about traefik! Is it easy to configure?
From a DM with @billowy-army-68599. recording here for the #general channel’s sake: I’m asking for a feature that says don’t destroy/create this resource for X amount of time. Preferably this delay would only start after it’s dependents have been destroyed. I need, say the ALB controller, to delete all Loadbalancers before the controller is itslef deleted. Pulumi doesn’t know to wait for that A delay would destroy the ingresses who are dependent on the ALB controller, then the ALB controller should be given say an extra 1m to delete the LB’s, and only then should Pulumi delete the controller. Does this make sense? Lee’s Reply: it makes sense, yes. this isn’t possible though [9:27 PM] you can do some stuff inside an apply on create, and do a sleep if needed [9:28 PM] but not for a delete My reply: • Would this be worthy of a github issues? • [9:30 PM]  does it make sense that it is currently impossible to guarantee, bec of  a race condition, that the LB’s would be deleted before the controller? [9:31 PM] Currently, the only way might be to have 2 separate deployments, so one to destroy the ingresses, and then one to destroy the controller [9:33 PM] someone in a thread mentioned using traefik as a controller. Have you used this before? He mentioned this uses CRD’s and that they get correctly awaited Lee’s Reply: it’s not worthy of a github issue I’m afraid, it’s a subtlety in the way the alb load baancer controller works. creating a new load balancer for each ingress, and there’s no await ability on the status I’m afraid, especially with a delete. [9:36 PM] i personally stick with the nginx ingress controller and a single elb
@billowy-army-68599 Do you have gist/repo that has an example of the nginx controller?
s
Thank you!
f
@steep-portugal-37539 it's not hard to setup. It ends up being similar to NGINX. It will be one lb for all deployments. I prefer traefik because i don't have to deal with nginx configs for the routing 🙂
s
@future-refrigerator-88869 sounds great! Do you happen to have code deploying traeifk that you are able to share?
f
Sure. I took a lot of inspiration from this project: https://github.com/aporia-ai/mlplatform-workshop/tree/main/infra
take a look at the
index
and
TraefikRoute
s
awesome thank you very much. Do you understand fully how it doesn’t run into the same issues that the AWS alb controller has?
b
@steep-portugal-37539 it doesnt have to provision AWS resources in order to work
s
Hey Lee Not sure i follow. Who is provisioning the loadbalancer(s)?
f
It does provision the load balancer but It is done via custom resource definition which apparently gets awaited in pulumi
the problem is the k8s ingress resource. specifically the new versions (as far as i can tell)
s
ok makes sense. I think this is a similar issue to what i’m having with external-dns controller. So if you destroy external-dns and the ingresses at the same time, there can be a race condition between the controller deleting the DNS records and it itself being deleted. It may not have enough time to delete the records before it is destroyed. It seems there might not be any way for pulumi to know when the records have been created/destroyed bec pulumi itself is not touching them.
If a fix could be implemented for the ALB ctl to detect the underlying AWS resources, perhaps the route53 records created by external-dns could also be awaited. Not sure if pulumi could have this kind of visability.
b
@steep-portugal-37539 the problem with these resources that run inside Kubernetes clusters is that they provision AWS resources, and the state is eventually consistent. So when you add a new ingress resource with the ALB controller, it provisions target groups etc behind the scenes. When you delete an ingress, the ALB controller pod will reconcile the external resources: Pulumi doesn't know anything about them if the mechanism has a proper status field, you can sometimes await them, but they often don't, and it won't work on delete, because a delete just goes to the k8s api, deletes the ingress and if that succeeds, it'll deleted the controller deployment With the traefik/nginx ingress controllers, it operates as a reverse proxy inside the cluster. no external AWS resources are provisioned. you get a single load balancer with a service
type=LoadBalancer
which has a proper status field populated with a load balancer address. So when you delete an ingress, the controller deployment only has to update the backing traefik or nginx config, it doesn't have to reconcile any external resources
I would recommend taking a look at the flow of adding a new resource, observe how it works, and it'll quickly make sense why this happens. Let's say provision an ingress with ALB controller. The operation is: • alb controller see the ingress API has a new resource • alb controller creates backing target groups etc When you do a delete/destroy of the helm chart, the process is • pulumi deletes the ingress • pulumi waits until the ingress is deleted, then when it succeeds, it deletes the controller pod that reconciled the ingress The correct Kubernetes wait to deal with these is using finalizers. If you have finalizers on the ingress resource, this wouldn't happen
i hope that helps, I'm not sure of a better way to explain it
s
Thanks for all that Lee! Yes this all makes sense. What i would say tho is that pulumi will delete the ingress and controller at the same time if you don’t make the ingress dependent on the controller. Even if you make it dependent there can be race conditions to deleting the aws resources, as we know. There are in fact finalizers added to the alb CRD
targetgroupbindings.elbv2.k8s.aws
, and to the ingresses as well. I usually have to manually remove the finalizers, as pulumi gets into a stuck state of not being able to delete the
targetgroupbindings.elbv2.k8s.aws
CRD. Once i remove the finalizers, it is destroyed, and then I have to do the same for the ingresses. Then pulumi is able to move on. I also have to manually delete the Loadbalancers in AWS as well as the target groups.
It seems the best solution for now is to do 2 separate deployments. One to destroy the ingresses. And then to destroy the controllers
This issue describes a lot of what we are saying here: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1629
👍 1