This message was deleted Pulumi Community #general

Join Slack

This message was deleted.

# general

sparse-intern-71089

11/03/2021, 4:08 PM

This message was deleted.

future-refrigerator-88869

11/03/2021, 11:51 PM

I've had a lot of issues with k8s Ingress with pulumi. That's because the k8s package is not awaiting the underlying resource creation. For example for aws-load-balancer-controller that would be your ALB. When creating the controller, pulumi doesn't await the ALB creation. That meant that I couldn't get the hostname, for example, of an ingress resource. It also created a bunch of issues destroying the stack after because the stack is not aware there is another underlying resource using it. I have ditched the aws-load-balancer-controller because of this issue and instead started using traefik with custom resource definition (which gets corretly awaited). For reference: https://github.com/pulumi/pulumi-kubernetes/issues/1649 The issue has been added to a milestone so i'm guessing is on the roadmap to be fixed 🤷

steep-portugal-37539

11/04/2021, 1:08 AM

Yes i’ve experience the same issues. Thanks for the tip about traefik! Is it easy to configure?

steep-portugal-37539

11/04/2021, 1:42 AM

From a DM with @billowy-army-68599. recording here for the #C84L4E3N1 channel’s sake: I’m asking for a feature that says don’t destroy/create this resource for X amount of time. Preferably this delay would only start after it’s dependents have been destroyed. I need, say the ALB controller, to delete all Loadbalancers before the controller is itslef deleted. Pulumi doesn’t know to wait for that A delay would destroy the ingresses who are dependent on the ALB controller, then the ALB controller should be given say an extra 1m to delete the LB’s, and only then should Pulumi delete the controller. Does this make sense? Lee’s Reply: it makes sense, yes. this isn’t possible though [9:27 PM] you can do some stuff inside an apply on create, and do a sleep if needed [9:28 PM] but not for a delete My reply: • Would this be worthy of a github issues? • [9:30 PM] does it make sense that it is currently impossible to guarantee, bec of a race condition, that the LB’s would be deleted before the controller? [9:31 PM] Currently, the only way might be to have 2 separate deployments, so one to destroy the ingresses, and then one to destroy the controller [9:33 PM] someone in a thread mentioned using traefik as a controller. Have you used this before? He mentioned this uses CRD’s and that they get correctly awaited Lee’s Reply: it’s not worthy of a github issue I’m afraid, it’s a subtlety in the way the alb load baancer controller works. creating a new load balancer for each ingress, and there’s no await ability on the status I’m afraid, especially with a delete. [9:36 PM] i personally stick with the nginx ingress controller and a single elb

steep-portugal-37539

11/04/2021, 1:44 AM

@billowy-army-68599 Do you have gist/repo that has an example of the nginx controller?

billowy-army-68599

11/04/2021, 1:56 AM

https://github.com/jaxxstorm/pulumi-nginx-demo/blob/main/nginx-ingress/__main__.py

steep-portugal-37539

11/04/2021, 3:03 AM

Thank you!

future-refrigerator-88869

11/04/2021, 9:01 AM

@steep-portugal-37539 it's not hard to setup. It ends up being similar to NGINX. It will be one lb for all deployments. I prefer traefik because i don't have to deal with nginx configs for the routing 🙂

steep-portugal-37539

11/04/2021, 6:50 PM

@future-refrigerator-88869 sounds great! Do you happen to have code deploying traeifk that you are able to share?

future-refrigerator-88869

11/04/2021, 6:54 PM

Sure. I took a lot of inspiration from this project: https://github.com/aporia-ai/mlplatform-workshop/tree/main/infra

future-refrigerator-88869

11/04/2021, 6:54 PM

take a look at the

index

and

TraefikRoute

steep-portugal-37539

11/04/2021, 8:00 PM

awesome thank you very much. Do you understand fully how it doesn’t run into the same issues that the AWS alb controller has?

billowy-army-68599

11/04/2021, 8:05 PM

@steep-portugal-37539 it doesnt have to provision AWS resources in order to work

steep-portugal-37539

11/04/2021, 8:06 PM

Hey Lee Not sure i follow. Who is provisioning the loadbalancer(s)?

future-refrigerator-88869

11/04/2021, 8:14 PM

It does provision the load balancer but It is done via custom resource definition which apparently gets awaited in pulumi

future-refrigerator-88869

11/04/2021, 8:15 PM

the problem is the k8s ingress resource. specifically the new versions (as far as i can tell)

steep-portugal-37539

11/04/2021, 8:25 PM

ok makes sense. I think this is a similar issue to what i’m having with external-dns controller. So if you destroy external-dns and the ingresses at the same time, there can be a race condition between the controller deleting the DNS records and it itself being deleted. It may not have enough time to delete the records before it is destroyed. It seems there might not be any way for pulumi to know when the records have been created/destroyed bec pulumi itself is not touching them.

steep-portugal-37539

11/04/2021, 8:27 PM

If a fix could be implemented for the ALB ctl to detect the underlying AWS resources, perhaps the route53 records created by external-dns could also be awaited. Not sure if pulumi could have this kind of visability.

billowy-army-68599

11/04/2021, 9:09 PM

@steep-portugal-37539 the problem with these resources that run inside Kubernetes clusters is that they provision AWS resources, and the state is eventually consistent. So when you add a new ingress resource with the ALB controller, it provisions target groups etc behind the scenes. When you delete an ingress, the ALB controller pod will reconcile the external resources: Pulumi doesn't know anything about them if the mechanism has a proper status field, you can sometimes await them, but they often don't, and it won't work on delete, because a delete just goes to the k8s api, deletes the ingress and if that succeeds, it'll deleted the controller deployment With the traefik/nginx ingress controllers, it operates as a reverse proxy inside the cluster. no external AWS resources are provisioned. you get a single load balancer with a service

type=LoadBalancer

which has a proper status field populated with a load balancer address. So when you delete an ingress, the controller deployment only has to update the backing traefik or nginx config, it doesn't have to reconcile any external resources

billowy-army-68599

11/04/2021, 9:12 PM

I would recommend taking a look at the flow of adding a new resource, observe how it works, and it'll quickly make sense why this happens. Let's say provision an ingress with ALB controller. The operation is: • alb controller see the ingress API has a new resource • alb controller creates backing target groups etc When you do a delete/destroy of the helm chart, the process is • pulumi deletes the ingress • pulumi waits until the ingress is deleted, then when it succeeds, it deletes the controller pod that reconciled the ingress The correct Kubernetes wait to deal with these is using finalizers. If you have finalizers on the ingress resource, this wouldn't happen

billowy-army-68599

11/04/2021, 9:12 PM

i hope that helps, I'm not sure of a better way to explain it

steep-portugal-37539

11/05/2021, 3:35 AM

Thanks for all that Lee! Yes this all makes sense. What i would say tho is that pulumi will delete the ingress and controller at the same time if you don’t make the ingress dependent on the controller. Even if you make it dependent there can be race conditions to deleting the aws resources, as we know. There are in fact finalizers added to the alb CRD

targetgroupbindings.elbv2.k8s.aws

, and to the ingresses as well. I usually have to manually remove the finalizers, as pulumi gets into a stuck state of not being able to delete the

targetgroupbindings.elbv2.k8s.aws

CRD. Once i remove the finalizers, it is destroyed, and then I have to do the same for the ingresses. Then pulumi is able to move on. I also have to manually delete the Loadbalancers in AWS as well as the target groups.

steep-portugal-37539

11/05/2021, 3:35 AM

It seems the best solution for now is to do 2 separate deployments. One to destroy the ingresses. And then to destroy the controllers

steep-portugal-37539

11/05/2021, 4:14 AM

This issue describes a lot of what we are saying here: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1629

👍 1

31 Views

Open in Slack

Previous Next