any guidance on size of pulumi stacks being too la...
# general
w
any guidance on size of pulumi stacks being too large / ways to speed things up? Context: • pulumi stack of ~3000 resources w/ AWS backend ◦ All resources are datadog items (monitors, SLOs, etc) ◦ File size of stack is ~23 mb • pulumi-up takes > 1 hr to complete, even when there are < 10 diffs between updates For reference, a subset of that stack (160 resources) still takes >= 30s to pulumi up w/ 0 changes in any of the resources. From searching github and various places it seems like the AWS backend is slow in general in comparison to pulumi's, but its difficult to understand why it would be this slow when there are no actual changes to make
b
run it like so:
Copy code
PULUMI_EXPERIMENTAL=1 PULUMI_SKIP_CHECKPOINTS=true pulumi up
it’s slow because the round trip time for each checkpoint adds latency for each trip
w
does pulumi still checkpoint even when the diff is 0?
b
bear in mind, if you kill pulumi with skip checkpoints you’ll lose data
yes
w
huh - i presumed (naively) that the diffing happens locally after pulling the stack, and then the only info being written -> AWS stack json is for changes. Aside from reading through the codebase, is there any block diagram somewhere in the docs that shows this whole workflow? IE. what actually happens on up / preview
Trying to understand what is serial, parallel, etc. and all with preview, up, and different backends isnt the most intuitive. For example, one thing thats unintuitive to me is that creating resources w/ an empty stack is much faster than updating a stack w/ resources, even if theres no diff. ex: pulumi up 800 new resources to an empty stack = 20s, while pulumi up w/ 800 unchanged resources is > 3 minutes
also another follow up (ty for help here @billowy-army-68599 🙏 )
bear in mind, if you kill pulumi with skip checkpoints you'll lose data
What is the best way to get back to a health stack state if a failure occurs and we arent checkpointing? Cancel, referesh, up?
m
try with stack reference , for networking part be centralized stack to other resources stacks
w
@millions-parrot-88279 can you elaborate on that? Right now we use a single stack, and my limited understanding is that stack references is a way to optimize working w/ a multi-stack setup
i
@wonderful-lunch-8542 I believe by default the CLI performs a refresh before up which you can also disable
But it would be great to have traces for a pulumi up, it would help to see what’s causing slowdowns. I’m about to start breaking up my large stack into multiple micro stacks, and having the info about which resources are costing the most time during a deploy would help me decide which areas to focus on first
Here’s the docs for the —refresh flag, it looks like the default is true from the description right? https://www.pulumi.com/docs/cli/commands/pulumi_up/
w
i think it would be default false if not provided, but its honestly hard to say 🤔
i
It says “string=true” which at least in most languages indicates that true is the default, you might want to set that to false
w
base on https://github.com/pulumi/pulumi/blob/ab17473110911fbe23be159c58f9d8a38a35ba71/pkg/cmd/pulumi/util.go#L962 which is called from the cmd in this i think it defaults to false unless the project specifies otherwise. but just educated guessing here. i'll test it out w/ explicit falses
b
refresh is off by default
i
Huh, that =true is confusing then because the equals sign is how you declare a default value in JavaScript and python