`pulumi destroy` on an AKS cluster with custom net...
# general
r
pulumi destroy
on an AKS cluster with custom networking (referencing a self-created subnet from the aks cluster) resulted in an error of wrong destroy order (subnet cannot be deleted as there are resources in it) So pulumi lost the aks cluster from the pulumi state but it still exists in Azure. Adding a
dependsOn
seems unnecessary as the chain should already exist due to the reference of the subnet in the aks cluster creation. How to clean that up?
pulumi up
will create an additional cluster as there is no more reference in the pulumi state…
Manually deleting the aks cluster and running
pulumi destroy
once more resolved the issue. But I’m wondering how I can express this behaviour that deleting the aks cluster must be awaited before deleting the subnet.
w
In general - the dependency order should ensure that this does not happen. It is possible that you are hitting https://github.com/pulumi/pulumi/issues/2948. That is the one case I know of where ordering can be violated. It involves a failure during one update where there are pending deletes that then attempt to get processed to early on the next update. Is it possible that's the sequence of events you saw here?
r
Not really. I didn’t observe any intermediate issues with pending deletes (at least could not read something like that in the log)
Copy code
Resources:
    - 21 to delete

Do you want to perform this destroy? yes
Destroying (dev):

     Type                                         Name                                          Status                  Info
     pulumi:pulumi:Stack                          azure-infra-eval-dev                          **failed**              1 error
 -   ├─ pulumi:providers:kubernetes               neo-azure-pulumi-eval-aks-provider            deleted
 -   ├─ azure:containerservice:KubernetesCluster  neo-pulumi-aks                                deleted
 -   ├─ azuread:index:ServicePrincipalPassword    neo-azure-pulumi-eval-aks-sp-password         deleted
 -   ├─ azure:role:Assignment                     neo-azure-pulumi-eval-acr-assignment          deleted
 -   ├─ azuread:index:ServicePrincipalPassword    neo-azure-pulumi-eval-acr-push-sp-password    deleted
 -   ├─ azure:role:Assignment                     neo-azure-pulumi-eval-aks-network-assignment  deleted
 -   ├─ azure:dns:ARecord                         dns-dummy-azureeval                           deleted
 -   ├─ azure:dns:ARecord                         dns-wildcard-azureeval                        deleted
 -   ├─ azure:role:Assignment                     neo-azure-pulumi-eval-acr-push-assignment     deleted
 -   └─ azure:network:Subnet                      neo-azure-pulumi-eval-aks-subnet              **deleting failed**     1 error
 al-aks-subnet: Error deleting Subnet "neo-azure-pulumi-eval-aks-subnet96be16e1" (Virtual Network "neo-azure-pulumi-eval-aks-v
Diagnostics:
  azure:network:Subnet (neo-azure-pulumi-eval-aks-subnet):
    error: Plan apply failed: deleting urn:pulumi:dev::azure-infra-eval::azure:network/subnet:Subnet::neo-azure-pulumi-eval-aks-subnet: Error deleting Subnet "neo-azure-pulumi-eval-aks-subnet96be16e1" (Virtual Network "neo-azure-pulumi-eval-aks-vnet5012bfc3" / Resource Group "pulumi-azure-evalf9c2b004"): network.SubnetsClient#Delete: Failure sending request: StatusCode=400 -- Original Error: Code="InUseSubnetCannotBeDeleted" Message="Subnet neo-azure-pulumi-eval-aks-subnet96be16e1 is in use by /subscriptions/xxxxxxxx-xxxx-xxxx-b584-12e917417d0f/resourceGroups/MC_pulumi-azure-evalf9c2b004_neo-pulumi-aksb9e37d00_westeurope/providers/Microsoft.Network/networkInterfaces/aks-aksagentpool-28966608-nic-0/ipConfigurations/ipconfig1 and cannot be deleted. In order to delete the subnet, delete all the resources within the subnet. See <http://aka.ms/deletesubnet|aka.ms/deletesubnet>." Details=[]

  pulumi:pulumi:Stack (azure-infra-eval-dev):
    error: update failed
Basically the aks cluster was reported as deleted but I assume the real deletion takes way longer on azure and then when trying to delete the subnet this fails as there are still ips of the cluster nodes attached. Is that what you mean with a “pending delete”?
So the failure appeared in the first run of the destroy, not in a subsequent one.
w
I see.
Basically the aks cluster was reported as deleted but I assume the real deletion takes way longer on azure and then when trying to delete the subnet this fails as there are still ips of the cluster nodes attached.
That sounds like a bug in Azure or in the upstream Terraform Azure Provider then. I have definitely seen issues with Subnet deletion taking hours on Azure - and I know many of these have been fixed in the upstream provider. @broad-dog-22463 may be able to point at tracking issues or make sure there is one tracking this upstream if it is reproducible.
r
As I’m currently doing that for several subscriptions, I will get some more insights if this issue is permanent/reproducible - will follow up here.
That one project is also a bit older and maybe not using the latest of all plugins. I keep an eye on that, too.
This is getting stranger and stranger. I now completely destroyed the stacks, reverted to the old version of the code, did a fresh
npm install
and still observe the same strange issues that no `Application`s are created. This is pretty frustrating. Could this be caused by an update of the azure-cli?
My previously working setup…
Copy code
$ npm list --depth=0
azure-typescript@ /Users/ajaegle/dev/evals/pulumi-tests/azure-infra-eval
├── @pulumi/azure@0.19.5
├── @pulumi/azuread@0.18.4
├── @pulumi/kubernetes@0.25.6
├── @pulumi/pulumi@0.17.28
├── @pulumi/random@1.0.0
└── @types/node@12.11.1
OH CRAP. After hours of investigating I recognized that I was hit heavily by some 🤬 default provider configuration falling back to the current
azure-cli
context that pointed to a different azure tenant… 😠 The
Application
was created in a different tenant hence could not be referenced by the
Assignment
. It was caused by the
azuread
provider which was extracted into an own module and no longer used the
azure:tenantId
,
azure:clientId
, … Of course that worked the whole time my pulumi programs targeted the same tenant as my cli. This mechanism is pure evil and russian roulette. We definitely need a way to disable this default behaviour globally! At least I expected this to happen… https://pulumi-community.slack.com/archives/C84L4E3N1/p1567452303172900
Sorry for my French. Getting back to constructive mode. Is this a bug in the
azuread
module or do I need to configure the Azure provider twice in
azure:xxx
and
azuread:xxx
(for
xxx
in
[tenantId, subscriptionId, clientId, clientSecret, location, environment]
?
b
yea i hate the ambient provider thing, it's bitten me a million times
its insane that a simple missing config would lead to it making changes to whatever cluster you last logged into with kubectl instead of one that's managed by pulumi
b
@rhythmic-finland-36256 if you use env vars for those values both providers will pick those up - if you manually register them then you will need to register both I’m afraid
r
That was exactly my fear when I first saw this. I was afraid of missing one explicit provider in a k8s resource and accidentally deploying it to the wrong cluster. Now it happened with the
azuread
module using my cli credentials…
The drawback of env vars is that it will get harder to switch between stacks, right?
I would really like to see those providers disabled by default and then maybe some api that I can read the config explicitly and pass it to the default provider. This would allow using pulumi config for different stacks, no accidental default provider with only a bit of verbosity (by reading the config manually)
b
Yes, using EnvVars, you wouldn't be able to switch so easily - you would need to reset them
😞
w
Re: default providers - we definitely want to provide an option to disable default providers - and to make it easier to manage project without default providers. Although not a first-class feature - one of the things I've seen (and used myself) is to configure the default provider with illegal values (a non-existent region, etc.) so that any (accidental) attempt to use it will fail. That said - we definitely want to solve for this in more first-class way!
b
that would be really great, even if it was just a PULUMI_NO_AMBIENT_PROVIDERS env-var i could set so that it didnt break any other set ups
👍 1
r
I used the setup with wrongly configured default providers for kubernetes and explicitly configured the azure provider using the
azure:xxx
configs - but somehow missed that the
azuread
module uses different config properties. I would definitely vote for a
disable ALL default providers
to avoid such dangerous scenarios.
w
Got it. I thought we already had an issue on this - but I don't see one. Opened https://github.com/pulumi/pulumi/issues/3383. Feel free to 👍.
👍 3
r
Thanks a lot for tracking that issue. This is definitely important for production adoption.