Updating an existing AKS cluster causes issues due...
# azure
r
Updating an existing AKS cluster causes issues due to unset outbound IP (loadBalancerProfile) which worked before and might have been caused by a hidden update of the providers used.
I performed an update of a pulumi program that didn’t change anything on the AKS cluster (basically only added one unrelated pulumi export) and now the process fails with the following error:
Copy code
* updating urn:pulumi:dev::streamm::ajaegle:azureaks:AksCluster$azure:containerservice/kubernetesCluster:KubernetesCluster::streamm-dev-aks: updating Managed Kubernetes Cluster "streamm-dev-aks8b71e006" (Resource Group "streamm-devf3b8623a"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidLoadBalancerProfile" Message="Load balancer profile must specify one of ManagedOutboundIPs, OutboundIPPrefixes and OutboundIPs." Target="networkProfile.loadBalancerProfile"
I did several updates on an existing cluster today and didn’t encounter some issues. Tonight I created a new stack based on the same program which was successful (initial
pulumi up
). Every additional update though ends up in this error.
We are on
Standard SKU
loadbalancers and didn’t specify the loadBalancerIP upfront as this is handled afterwards by deploying the ingress controller.
If I’m not wrong, I saw some pulumi provider updates when performing a
pulumi up
this afternoon. Is it possible that things change in the background even if I specify explicit versions and use
npm ci
leveraging the package-lock.json?
For now I added a
loadBalancerProfile
with the ingress IP already set like
Copy code
loadBalancerProfile: {
  outboundIpAddressIds: [args.loadBalancerIpForEgress.id],
},
f
I think I had a similar issue, but used this workaround:
Copy code
loadBalancerProfile: {
  managedOutboundIpCount: 2,
}
r
If I remember correctly, specifying just the number didn’t solve it for me. What’s the effect of that? Does it create an egress ip implicitly?
Using the IP that was designated for the ingress also doesn’t solve the issue as this IP address then is already claimed and can not be used any longer for a Kubernetes service of type LoadBalancer. So for now I created an additional public IP for the egress.
To me it makes totally sense that I need some public IP address assigned for egress traffic. But I’m wondering how that worked before. I always deployed them without specifying this and used the IP only when I deployed the first service (ingress controller).
Strangely, also the initial creation worked without this setting (I was connected to it via
kubectl
so it was created successfully) . That error only occurs when performing another
pulumi up
with an existing cluster (even if there are no programmatical changes to the AKS part of the pulumi program).
This issue popped up again. Now it even happens with the same code that worked the last time. I really have the impression that there are some updates happening on my machine that are not version managed properly. The only change I did in the meantime was creating a new pulumi project (which downloaded newer versions of the providers and those additional things that are installed during the
npm install
). I removed my
node_modules
again and did a fresh
npm ci
(not
npm install
to make sure I get the locked versions. How can I find out what’s happening here?
BTW: The bug I’m facing is somewhere in the terraform azurerm module https://github.com/terraform-providers/terraform-provider-azurerm/issues/6525
The underlying reason for sometimes working and then not working again might be this one which sounds super ugly to work around… https://github.com/terraform-providers/terraform-provider-azurerm/issues/6525#issuecomment-617116612