We are facing some performance bottlenecks with pulumi and n Pulumi Community #general

We are facing some performance bottlenecks with pu...

proud-animal-2285

09/25/2025, 12:02 PM

We are facing some performance bottlenecks with pulumi and need some help to identify what we can do to solve these • The main issue is pulumi

previews

and

up

takes 40-50 mins for production environment (stack) • This causes all sorts of problems for us ◦ We can't apply things immidiately using puluimi (will take 40 mins) ◦ If for some reason there is error in

up

preview

another hour is needed ◦ Slower development cycle ◦ Merging the pulumi PR into main is so slow (since we run previews for each environment (stack) before merging) Some technical information about our production stack • We have over 21000 resources in the stack. Lesser number of resources in non-production stacks • We have a lot of edge devices (say 1000s) and we create a bunch of things for these edge devices (certs, keys, AWS IoT thing, etc). A lot of those 21000 resources are because of these edge device AWS resources • We run previews in CI but

up

is run from local laptop • When deploying production stack we do use

PULUMI_SKIP_CHECKPOINTS=1

• We are using s3 backend Some questions I have 1. Is 40-50 mins expected time here? 2. What can we do to improve the speed of previews and applies?

white-vase-18996

09/25/2025, 4:08 PM

do you need to refresh each time you are updating the stack?

white-vase-18996

09/25/2025, 4:12 PM

you could also try increasing the parallelism flag to make more read operations concurrently if you cant turn off refresh

many-telephone-49025

09/25/2025, 4:42 PM

To give my 2ct: I am a huge fan on micro stacks. Pulumi CLI has some good function to help splitting an existing monolithic into several stacks. Then using StackReferences to access values from one stack in another stack. Not only give this a performance boost and quicker feedback but also you can enable a separation of concern and even different ownerships of the stack (network folks, DBAs, etc.) As described here: https://www.pulumi.com/blog/iac-best-practices-structuring-pulumi-projects/ https://www.pulumi.com/blog/iac-best-practices-applying-stack-references/

proud-animal-2285

09/25/2025, 5:11 PM

1. Do you mean pulumi refresh? No, we don't need refresh every time we update the stack. We just run

pulumi up

hit yes 2. I will try with increased parallelism. Also, check what the current limit it

white-vase-18996

09/25/2025, 5:12 PM

1. no I mean when you are running preview. Have you tried running

pulumi preview --refresh=false

white-vase-18996

09/25/2025, 5:12 PM

2. the default is 16

proud-animal-2285

09/25/2025, 5:15 PM

1. yes, we run with

refresh = false

in CI using the action

Copy code

- name: Preview ${{ inputs.stack-name }}
        uses: pulumi/actions@v6
        with:
          pulumi-version: 3.193.0
          command: preview
          refresh: false
          stack-name: ${{ inputs.stack-name }}
          work-dir: ${{ inputs.work-dir }}
          comment-on-pr: true
          diff: true

white-vase-18996

09/25/2025, 5:18 PM

so even with preview false it is taking 40-50 mins to preivew?

proud-animal-2285

09/25/2025, 5:18 PM

2. How high can I set the parallelism to (practically? I am running with 64 right now

white-vase-18996

09/25/2025, 5:18 PM

you can increase it until you start getting rate limited by aws

proud-animal-2285

09/25/2025, 5:18 PM

Yes, 40-50 mins with refresh = false

white-vase-18996

09/25/2025, 5:19 PM

If neither of these help, than I think the answer would be to isolate into smaller stacks

white-vase-18996

09/25/2025, 5:19 PM

@echoing-dinner-19531 is it expected for preview to take 50 mins even if its not doing reads? seems pretty hefty even for 21k resources

echoing-dinner-19531

09/25/2025, 5:30 PM

Some parts of the system are currently (its an area of investigation to change it) sequential. So all 21000 thousand resources have to go through 1 by 1.

echoing-dinner-19531

09/25/2025, 5:30 PM

That could probably explain the slowness for so many resources, I don't think we've really got perf testing at that level of resources.

proud-animal-2285

09/25/2025, 5:32 PM

What is the typical level of resources you do your perf testing and expected times for previews (and applies)? Will be helpful if we decide to break down the stacks into smaller ones

echoing-dinner-19531

09/25/2025, 5:42 PM

The ones I know of are a few hundred resources and vary by runtime from a few seconds to maybe a minute (a lot of that can be due to package manager overhead). From talking to users I think a few thousand generally runs well, but 21000 is high.

Open in Slack

Previous Next