Is there a way to tell pulumi to retry before fail...
# dotnet
w
Is there a way to tell pulumi to retry before failing when creating a resource. I have some trouble creating an API Management thing that seems timing related, so 1-2 retry should fix it. I have dependencies, but that doesn't seem to help for some reason.
t
Normally, we do retries automatically, if they are warranted based on the HTTP code that we get
Which error do you get and what’s the timing?
w
Will check more in detail later tonight. It is when updating APIM and sometime it seems like it get a 40x against the openapi endpoint in the app. I’m pretty sure it would succeed if it just tried again in a couple of seconds.
It doesn’t happen every time and the app is up right after the error has occurred.
Is it there I could use the customTimeouts?
t
Custom timeouts won’t help… 40x is supposed to mean a client error which doesn’t normally make sense to retry.
w
We have the following setup that we deploy with pulumi (simplified) * App (.NET Core 5) as kubernetes deployment * Traefik ingress * APIM The APIM API resource depends on the kubernetes deployment resource and the ingress resource. In most deploys that failed we did no changes to the ingress, only the deployment. The kubernetes deploy succeed, but then the APIM API deploy fails. Error message is 503 (not 40x as I mentioned), when APIM is trying to get the openapi spec. Not sure why it is 503, since the deploy of kubernetes deployment is finished and we have configured liveness, startup and readiness probe setup.
t
So, API Management gets an error when trying to read the spec from k8s, gets 503, and then pushes down the error to Pulumi?
w
@tall-librarian-49374 That’s how I read the log output. I can share it later today.
The last part of the logs from one of the failed runs:
Copy code
~  azure:apimanagement:Api ApimApiExternal updating [diff: ~import]
 ~  azure:apimanagement:Api ApimApiInternal updating [diff: ~import]
 ~  azure:apimanagement:Api ApimApiExternal updating [diff: ~import]; error: 1 error occurred:
 ~  azure:apimanagement:Api ApimApiExternal **updating failed** [diff: ~import]; error: 1 error occurred:
 ~  azure:apimanagement:Api ApimApiInternal updated [diff: ~import]
    pulumi:pulumi:Stack XXXXX.yyyyyyy.zzzzzzz.Deploy-XXXXX.yyyyyyy.zzzzzzz.deploy.dev running error: update failed
    pulumi:pulumi:Stack XXXXX.yyyyyyy.zzzzzzz.Deploy-XXXXX.yyyyyyy.zzzzzzz.deploy.dev **failed** 1 error
 
Diagnostics:
  azure:apimanagement:Api (ApimApiExternal):
    error: 1 error occurred:
    	* updating urn:pulumi:XXXXX.yyyyyyy.zzzzzzz.deploy.dev::XXXXX.yyyyyyy.zzzzzzz.Deploy::azure:apimanagement/api:Api::ApimApiExternal: creating/updating API Management API "XXXXX-yyyyyyy-zzzzzzz-api-external" (Resource Group "elkds-dev-cmn-rg"): apimanagement.APIClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ValidationError" Message="One or more fields contain incorrect values:" Details=[{"code":"ValidationError","message":"Parsing error(s): Failed to import from specified resource <https://zzzzzzz.elkds-dev.XXXXX.com/external/api/swagger/v1/swagger.json?guid=db534bd4-1e11-4590-9c50-c5f09d011921>: Response status code does not indicate success: 503 (Service Unavailable)..","target":"representation"}]
 
  pulumi:pulumi:Stack (XXXXX.yyyyyyy.zzzzzzz.Deploy-XXXXX.yyyyyyy.zzzzzzz.deploy.dev):
    error: update failed
I'll try to look through the logs to traefik, apim and traefik on Monday and see if I see the reason to the 503.
t
Maybe shoot a question in #kubernetes - I feel like it’s more related to the await logic there?
w
Yeah. It could also be our probing function that returns ok before the swagger is up or something.
On a second though, I don't understand how that could happen since the app needs to be up for the probing to return ok since it is http probing. Maybe there is some timing issue when traefik start directing to the new pods.
I did modify the startup probe so it is making a request to the swagger endpoint to make sure that is responding, that seems to have fixed it.
I thought this actually fixed it, but apparently not. We still see some weird 503 and 502 errors when APIM is being updated. The 503 and 502 is when APIM is reaching out to get the openapi spec.