https://pulumi.com logo
#general
Title
# general
w

wonderful-lunch-8542

12/26/2023, 7:40 PM
any guidance on size of pulumi stacks being too large / ways to speed things up? Context: • pulumi stack of ~3000 resources w/ AWS backend ◦ All resources are datadog items (monitors, SLOs, etc) ◦ File size of stack is ~23 mb • pulumi-up takes > 1 hr to complete, even when there are < 10 diffs between updates For reference, a subset of that stack (160 resources) still takes >= 30s to pulumi up w/ 0 changes in any of the resources. From searching github and various places it seems like the AWS backend is slow in general in comparison to pulumi's, but its difficult to understand why it would be this slow when there are no actual changes to make
b

billowy-army-68599

12/26/2023, 8:08 PM
run it like so:
Copy code
PULUMI_EXPERIMENTAL=1 PULUMI_SKIP_CHECKPOINTS=true pulumi up
it’s slow because the round trip time for each checkpoint adds latency for each trip
w

wonderful-lunch-8542

12/26/2023, 8:08 PM
does pulumi still checkpoint even when the diff is 0?
b

billowy-army-68599

12/26/2023, 8:08 PM
bear in mind, if you kill pulumi with skip checkpoints you’ll lose data
yes
w

wonderful-lunch-8542

12/26/2023, 8:10 PM
huh - i presumed (naively) that the diffing happens locally after pulling the stack, and then the only info being written -> AWS stack json is for changes. Aside from reading through the codebase, is there any block diagram somewhere in the docs that shows this whole workflow? IE. what actually happens on up / preview
Trying to understand what is serial, parallel, etc. and all with preview, up, and different backends isnt the most intuitive. For example, one thing thats unintuitive to me is that creating resources w/ an empty stack is much faster than updating a stack w/ resources, even if theres no diff. ex: pulumi up 800 new resources to an empty stack = 20s, while pulumi up w/ 800 unchanged resources is > 3 minutes
also another follow up (ty for help here @billowy-army-68599 🙏 )
bear in mind, if you kill pulumi with skip checkpoints you'll lose data
What is the best way to get back to a health stack state if a failure occurs and we arent checkpointing? Cancel, referesh, up?
m

millions-parrot-88279

12/26/2023, 8:15 PM
try with stack reference , for networking part be centralized stack to other resources stacks
w

wonderful-lunch-8542

12/26/2023, 8:22 PM
@millions-parrot-88279 can you elaborate on that? Right now we use a single stack, and my limited understanding is that stack references is a way to optimize working w/ a multi-stack setup
i

icy-controller-6092

12/26/2023, 11:07 PM
@wonderful-lunch-8542 I believe by default the CLI performs a refresh before up which you can also disable
But it would be great to have traces for a pulumi up, it would help to see what’s causing slowdowns. I’m about to start breaking up my large stack into multiple micro stacks, and having the info about which resources are costing the most time during a deploy would help me decide which areas to focus on first
Here’s the docs for the —refresh flag, it looks like the default is true from the description right? https://www.pulumi.com/docs/cli/commands/pulumi_up/
w

wonderful-lunch-8542

12/26/2023, 11:11 PM
i think it would be default false if not provided, but its honestly hard to say 🤔
i

icy-controller-6092

12/26/2023, 11:13 PM
It says “string=true” which at least in most languages indicates that true is the default, you might want to set that to false
w

wonderful-lunch-8542

12/26/2023, 11:58 PM
base on https://github.com/pulumi/pulumi/blob/ab17473110911fbe23be159c58f9d8a38a35ba71/pkg/cmd/pulumi/util.go#L962 which is called from the cmd in this i think it defaults to false unless the project specifies otherwise. but just educated guessing here. i'll test it out w/ explicit falses
b

billowy-army-68599

12/26/2023, 11:59 PM
refresh is off by default
i

icy-controller-6092

12/27/2023, 2:45 AM
Huh, that =true is confusing then because the equals sign is how you declare a default value in JavaScript and python
17 Views