<@U01E602D5JA> following up from <https://github.c...
# automation-api
w
@bored-oyster-3147 following up from https://github.com/pulumi/pulumi/pull/7299, I did a quick test forcing an error which didn't work as expected and the next run has hung.
This is the forced failure:
Copy code
Changes:
 
    Type                                                                   Name                                                                 Operation
>   pulumi:pulumi:StackReference                                           pharos/aws-eks/alpha                                                 read
-   kubernetes:core:ServiceAccount                                         kube-system/aws-load-balancer-controller                             delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-issuers                                      delete
-   kubernetes:<http://rbac.authorization.k8s.io:Role|rbac.authorization.k8s.io:Role>                              kube-system/cert-manager:leaderelection                              delete
-   kubernetes:<http://rbac.authorization.k8s.io:RoleBinding|rbac.authorization.k8s.io:RoleBinding>                       cert-manager/cert-manager-webhook:dynamic-serving                    delete
-   kubernetes:<http://rbac.authorization.k8s.io:Role|rbac.authorization.k8s.io:Role>                              kube-system/cert-manager-cainjector:leaderelection                   delete
-   kubernetes:<http://admissionregistration.k8s.io:ValidatingWebhookConfiguration|admissionregistration.k8s.io:ValidatingWebhookConfiguration> cert-manager-webhook                                                 delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-orders                                       delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-clusterissuers                               delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-ingress-shim                                 delete
-   kubernetes:apps:Deployment                                             cert-manager/cert-manager                                            delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-approve:cert-manager-io                      delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-clusterissuers                               delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                aws-load-balancer-controller-rolebinding                             delete
-   kubernetes:<http://admissionregistration.k8s.io:MutatingWebhookConfiguration|admissionregistration.k8s.io:MutatingWebhookConfiguration>   aws-load-balancer-webhook                                            delete
-   kubernetes:<http://apiextensions.k8s.io:CustomResourceDefinition|apiextensions.k8s.io:CustomResourceDefinition>               kube-system/aws-load-balancer-selfsigned-issuer                      delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                external-dns                                                         delete
-   kubernetes:core:Service                                                cert-manager/cert-manager                                            delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-webhook:subjectaccessreviews                            delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-webhook:subjectaccessreviews                            delete
-   kubernetes:core:Service                                                alpha/internet-gateway                                               delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-edit                                                    delete
-   kubernetes:apps:Deployment                                             kube-system/external-dns                                             delete
-   kubernetes:<http://rbac.authorization.k8s.io:RoleBinding|rbac.authorization.k8s.io:RoleBinding>                       kube-system/aws-load-balancer-controller-leader-election-rolebinding delete
-   kubernetes:<http://networking.k8s.io:Ingress|networking.k8s.io:Ingress>                                   alpha/internal-gateway                                               delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       aws-load-balancer-controller-role                                    delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-ingress-shim                                 delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-cainjector                                              delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-cainjector                                              delete
-   kubernetes:<http://admissionregistration.k8s.io:MutatingWebhookConfiguration|admissionregistration.k8s.io:MutatingWebhookConfiguration>   cert-manager-webhook                                                 delete
-   kubernetes:core:Service                                                cert-manager/cert-manager-webhook                                    delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-certificates                                 delete
-   kubernetes:apps:Deployment                                             kube-system/aws-load-balancer-controller                             delete
-   kubernetes:apps:Deployment                                             cert-manager/cert-manager-cainjector                                 delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-challenges                                   delete
-   kubernetes:core:Service                                                alpha/internal-gateway                                               delete
-   kubernetes:<http://rbac.authorization.k8s.io:RoleBinding|rbac.authorization.k8s.io:RoleBinding>                       kube-system/cert-manager-cainjector:leaderelection                   delete
-   kubernetes:core:ServiceAccount                                         cert-manager/cert-manager-cainjector                                 delete
-   kubernetes:core:Service                                                kube-system/aws-load-balancer-webhook-service                        delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       external-dns                                                         delete
-   kubernetes:<http://rbac.authorization.k8s.io:Role|rbac.authorization.k8s.io:Role>                              cert-manager/cert-manager-webhook:dynamic-serving                    delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-certificates                                 delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-orders                                       delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-issuers                                      delete
-   kubernetes:core:ServiceAccount                                         cert-manager/cert-manager-webhook                                    delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-approve:cert-manager-io                      delete
-   kubernetes:<http://apiextensions.k8s.io:CustomResourceDefinition|apiextensions.k8s.io:CustomResourceDefinition>               kube-system/aws-load-balancer-serving-cert                           delete
-   kubernetes:<http://rbac.authorization.k8s.io:Role|rbac.authorization.k8s.io:Role>                              kube-system/aws-load-balancer-controller-leader-election-role        delete
-   kubernetes:<http://rbac.authorization.k8s.io:RoleBinding|rbac.authorization.k8s.io:RoleBinding>                       kube-system/cert-manager:leaderelection                              delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-challenges                                   delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRoleBinding|rbac.authorization.k8s.io:ClusterRoleBinding>                cert-manager-controller-certificatesigningrequests                   delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-view                                                    delete
-   kubernetes:core:ServiceAccount                                         cert-manager/cert-manager                                            delete
-   kubernetes:core:ServiceAccount                                         kube-system/external-dns                                             delete
-   kubernetes:core:Service                                                kube-system/external-dns                                             delete
-   kubernetes:<http://rbac.authorization.k8s.io:ClusterRole|rbac.authorization.k8s.io:ClusterRole>                       cert-manager-controller-certificatesigningrequests                   delete
-   kubernetes:<http://admissionregistration.k8s.io:ValidatingWebhookConfiguration|admissionregistration.k8s.io:ValidatingWebhookConfiguration> aws-load-balancer-webhook                                            delete
-   kubernetes:<http://networking.k8s.io:Ingress|networking.k8s.io:Ingress>                                   alpha/internet-gateway                                               delete
-   kubernetes:apps:Deployment                                             cert-manager/cert-manager-webhook                                    delete
-   kubernetes:<http://apiextensions.k8s.io:CustomResourceDefinition|apiextensions.k8s.io:CustomResourceDefinition>               <http://challenges.acme.cert-manager.io|challenges.acme.cert-manager.io>                                      delete
-   kubernetes:<http://apiextensions.k8s.io:CustomResourceDefinition|apiextensions.k8s.io:CustomResourceDefinition>               <http://issuers.cert-manager.io|issuers.cert-manager.io>                                              delete
-   kubernetes:<http://apiextensions.k8s.io:CustomResourceDefinition|apiextensions.k8s.io:CustomResourceDefinition>               <http://clusterissuers.cert-manager.io|clusterissuers.cert-manager.io>                                       delete
 
Diagnostics:
 
pharos/k8s/alpha (pulumi:pulumi:Stack)
error: Running program 'D:\Devel\Mps\devops-gemini-pulumi\Gemini\bin\Debug\gemini.dll' failed with an unhandled exception:
Scriban.Syntax.ScriptRuntimeException: InternalGateway.yaml(5,17) : error : The variable or function `envName` was not found
   at void Scriban.TemplateContext.CheckVariableFound(ScriptVariable variable, bool found)
   at object Scriban.TemplateContext.GetValue(ScriptVariableGlobal variable)
   at object Scriban.Syntax.ScriptVariableGlobal.GetValue(TemplateContext context)
   at async ValueTask<object> Scriban.TemplateContext.GetOrSetValueAsync(ScriptExpression targetExpression, object valueToSet, bool setter)
   at async ValueTask<object> Scriban.TemplateContext.GetValueAsync(ScriptExpression target)
   at async ValueTask<object> Scriban.Syntax.ScriptVariable.EvaluateAsync(TemplateContext context)
   at async ValueTask<object> Scriban.TemplateContext.EvaluateAsync(ScriptNode scriptNode, bool aliasReturnedFunction) x 2
   at async ValueTask<object> Scriban.Syntax.ScriptExpressionStatement.EvaluateAsync(TemplateContext context)
   at async ValueTask<object> Scriban.TemplateContext.EvaluateAsync(ScriptNode scriptNode, bool aliasReturnedFunction) x 2
   at async ValueTask<object> Scriban.Syntax.ScriptBlockStatement.EvaluateAsync(TemplateContext context)
   at async ValueTask<object> Scriban.TemplateContext.EvaluateAsync(ScriptNode scriptNode, bool aliasReturnedFunction) x 2
   at async ValueTask<object> Scriban.Syntax.ScriptPage.EvaluateAsync(TemplateContext context)
   at async ValueTask<object> Scriban.TemplateContext.EvaluateAsync(ScriptNode scriptNode, bool aliasReturnedFunction) x 2
   at async ValueTask<object> Scriban.Template.EvaluateAndRenderAsync(TemplateContext context, bool render)
   at async ValueTask<string> Scriban.Template.RenderAsync(TemplateContext context)
   at async void Pulumi.Deployment+Runner+<>c__DisplayClass10_0.<WhileRunningAsync>g__HandleCompletion|0(?)+HandleCompletion(?) in /_/sdk/dotnet/Pulumi/Deployment/Deployment.Runner.cs:line 137
   at async Task<int> Pulumi.Deployment+Runner.WhileRunningAsync() in /_/sdk/dotnet/Pulumi/Deployment/Deployment.Runner.cs:line 177
 
Resources:
    - delete 61
    28 unchanged
 
Duration: 13s
Preview completed AFAICT
It still shows a bunch of deletes queued up as a result - would that happen if I ran up instead? I guess I'll have to try an actual update, once I can work out why it now hangs, and see what happens.
b
are you using a pre-release version?
w
Yes, latest alpha.
Basically, I ran preview with forced error, then I ran preview again which has hung - still spinning 25m later.
b
I would like to verify that the latest alpha includes the changeset that I made
It says it went out 2 hours ago but I don't know what commit it was built from
w
I can navigate to source and see your changes
b
well I'm not sure why it would be hanging, especially on preview
was this an existing stack? Did you by chance already have pending deletes in it?
w
It was an existing stack that was fully baked. All I did was tweak it to force an error.
b
well if you can get a repro let me know
w
Still hanging. Strange it hasn't timed out.
Had to kill the debug session:
Copy code
Changes:
 
    Type                         Name                 Operation
>   pulumi:pulumi:StackReference pharos/aws-eks/alpha read
 
Diagnostics:
 
pharos/k8s/alpha (pulumi:pulumi:Stack)
error: transport is closing
 
Resources:
    21 unchanged
 
Duration: 35m27s
@bored-oyster-3147 debugging into it it seems the exception is never thrown
i.e. it returns a preview result
b
what does
LocalRuntimeService
look like
that implies there was no
CommandException
w
@bored-oyster-3147
_callerContext.ExceptionDispatchInfo is null
Want to do a quick interactive session to debug it while I share my screen in slack?
b
I'm busy at the moment - is that EDI instance null? that would cause an exception to not be thrown
do you by chance have anything in your inline program that would cause the exception to not bubble out of it?
w
No, not catching at that scope.
b
but the EDI instance is still null?
w
Deployment.Runner.WhileRunningAsync.HandleCompletion
sees the exception, which is
Scriban.Syntax.ScriptRuntimeException
Calls
HandleExceptionAsync
which logs it and returns 32
Deployment.RunInlineAsync
then has
null
exceptionDispatchInfo
and returns 1 in lambda, returns
null
at end
LanguageRuntimeService.Run
then returns
new RunResponse()
b
what is different about this exception
is it thrown in an apply or something?
w
It does happen inside
Output.Create
b
and are you using a inline program delegate or the generic TStack?
w
Copy code
// gateways
new ConfigGroup("internal-gateway",
    new ConfigGroupArgs { Yaml = RenderTemplate("InternalGateway.yaml", ReadResource, new { Aws = AwsConfig }) },
    new ComponentResourceOptions { Provider = k8sProvider });
new ConfigGroup("internet-gateway",
    new ConfigGroupArgs { Yaml = RenderTemplate("InternetGateway.yaml", ReadResource, new { Aws = AwsConfig }) },
    new ComponentResourceOptions { Provider = k8sProvider });
I'm deriving from
Pulumi.Stack
(not directly)
ConfigGroupArgs.Yaml
is
InputList<string>
RenderTemplate
returns
Output<string>
b
so you are using
PulumiFn.Create<TStack>
?
w
Yes...
PulumiFn Create(IServiceProvider serviceProvider, Type stackType)
Copy code
var stackName = $"{Config.Pulumi.Organization.Name}/{settings.Environment.ToLower()}";
var stackArgs = new InlineProgramArgs(info.ProjectName, stackName, PulumiFn.Create(ServiceProvider, info.StackType))
{
    Logger = LoggerFactory.CreateLogger<Pulumi.Deployment>()
};
var stack = await LocalWorkspace.CreateOrSelectStackAsync(stackArgs);
b
so not
PulumiFn.Create<TStack>
ok just making sure I'm looking in the right place
w
No, the stack type is selected based on the "resources" to deploy, so not using the generic method.
b
OK I have a failing test
👍 1
w
My gut feel is
Pulumi.Deployment.Runner.RunAsync
and
WhileRunningAsync
are swallowing the exception when they should not: https://github.com/pulumi/pulumi/blob/258fb00bc2ecbd489af6d694a2204468cc7ca729/sdk/dotnet/Pulumi/Deployment/Deployment.Runner.cs#L61-L64 https://github.com/pulumi/pulumi/blob/258fb00bc2ecbd489af6d694a2204468cc7ca729/sdk/dotnet/Pulumi/Deployment/Deployment.Runner.cs#L179-L183 i.e. the try catch should be removed or rethrow (They do log the exception so maybe not removed)
Then any immediate exceptions in the stack ctor, or deferred via outputs, should propagate and be captured.
b
yes exceptions in the in-flight tasks are being swallowed
And finally
LanguageRuntimeService.Run
where it would no longer be
null
and so return the "bail" response: https://github.com/pulumi/pulumi/blob/258fb00bc2ecbd489af6d694a2204468cc7ca729/sdk/dotnet/Pulumi.Automation/Runtime/LanguageRuntimeService.cs#L61
The bail response only sets the exception message, so the inner exception handler should still be called to log the full exception and rethrow.
b
I went about it a little differently. Namely because
IRunner
is used by local programs too, and rethrowing in
IRunner
would cause other issues there
w
Sounds good. I think we understand it either way.
b
Also because the main issue here was the exit code not being threaded through for inline programs. At the very least we should've seen
CommandException
without an inline host exception, which was the first thing I fixed. Then needed to do some work to capture an aggregate of in-flight exceptions so that the explicit exception could bubble up
what a pain in the ass though. Me not touching
Deployment_Runner
sooner coming back to bite me
Glad you caught that!
🍺 1
w
FWIW, I like your fix. LGTM.
🙌 1
b
That PR to fix the swallowed exceptions was merged btw!
w
Yeah, I'm already using the latest alpha. Much better now!