How does the Pulumi engine batch up resource provi...
# general
m
How does the Pulumi engine batch up resource provisioning operations for execution and then wait on them? I'm seeing some behaviour I wouldn't expect when doing some large deployments, that I suspect is blowing out the runtime for
pulumi up
(taking up to hours longer than I would expect). I'll see a number (say 20, but I haven't counted the exact number) resources being provisioned at once, 19 of which are quick to provision (e.g. s3 bucket) and one of which takes a long time (e.g. RDS instance). There are many other resources in the stack that need to be provisioned, which don't depend on the RDS instance in any way, and may or may not depend on the 19 s3 buckets. I would expect that at any given time I would have 20 provisioning operations running at once (as the dependency graph allows), and as the s3 buckets provision that the provisioning operation of another resource would take it's place. However what seems to happen is that the engine waits for all 20 resources in this batch to finish provisioning before starting a new batch of 20, regardless of what the dependency graph would allow. Is this expected? Is it due to how the engine runs and waits on resource creation?
w
That's not expected; the behavior should match your expectation. The number of parallel resource creation operations is controlled with the
--parallel
flag.
For my own curiousity, what CLI version are you using and what language?
m
Sorry Robbie, been off sick and didn't get back to you quickly. Currently using Python and Pulumi
v3.66.0
with
pulumi-aws
5.36.0
Without having looked too far into it, some of the team suspect that it might be related to how the python libraries communicate with the engine, and wait for resources to create. I unfortunately don't have code to point to at the moment though.
w
Hope you're feeling better! It's also possible that somewhere along the way the Python runtime added a blocking future to the code, which would also cause the outcome you're describing (I think). If your code is written such that the 20 resources are e.g. in a component resource, and the next 20 depend on that component, then you'd also get this same behavior.
m
I don't believe that in this case it'll be the component resources. I'll try and make a reproduction just with dynamic resources that sleep to eliminate any dependencies. I don't have a wealth of experience with asynchio let alone the deeper internals of the AWS Pulumi Python libraries - what could possibly cause some resources to be blocking futures and not others?
w
what could possibly cause some resources to be blocking futures and not others?
Programmer error on the Pulumi-side of things! Sorry if that was unclear. I'm saying it's possible it's a bug.