Hi, I'm experiencing that `pulumi up` is very slow...
# azure
b
Hi, I'm experiencing that
pulumi up
is very slow for a particular project that I have, which has 1500+ resources while the pulumi state file is over 8MB. It's unclear to me why Pulumi is taking so long (perhaps it's likely making a lot of network calls to the state file). What is the cause of this issue, and how can I speed this up? I've seen some comments online saying to try
Copy code
PULUMI_EXPERIMENTAL=true PULUMI_SKIP_CHECKPOINTS=true
but it's unclear to me what happens if I cancel the update before it completes.
c
For a stack that large, it's also possible that you are getting throttled by Azure depending on how often you are running stack updates. Do you also run stack refreshes? Normally turning up verbose logging would allow you to examine the Azure API calls and check if Azure is throttling you, but given the size of the project that's probably not going to be useful. Also when you say the update is slow, can you see the timings for the resources in your stack and tell if there are certain resources that contribute to the lengthy update time? I think you'll need to narrow-down some things as there are a few variables that could be causing the slowness. It could also very well be Pulumi Cloud (depending on where you are running updates from and its connectivity to Pulumi API) contributing to the slowness in some way as well but it's unclear without more info.
By the way, hitting Ctrl+c the first time signals the Pulumi engine to bail and it tries to gracefully shutdown but a second Ctrl+c will force cancellation and almost certainly will lead to inconsistencies depending on the type of updates being carried out. If you are canceling a running CI, then there's no telling what that'll do. It almost always depends on the type of update being carried out. It's possible that Pulumi has reduced the likelihood of corruption with the frequent checkpoint patches the CLI issues, which I assume would be disabled if you used
PULUMI_SKIP_CHECKPOINTS
. (I don't know what that does but just taking a guess based on its naming.)
b
> For a stack that large, it's also possible that you are getting throttled by Azure depending on how often you are running stack updates. right now, we manually do it, so it's not often (like a couple times a day). In addition, nothing in the blob storage limits could explain why this is so slow, unless each update is downloading the state file locally each time. > Also when you say the update is slow, can you see the timings for the resources in your stack and tell if there are certain resources that contribute to the lengthy update time? what's one way of doing this you recommend? I've tried printing stuff out (using Python's print, but it gets lost when applying over this many resources) > It could also very well be Pulumi Cloud we use self-hosted backend (Azure)
is pulumi really uploading the entire state file for each resource update? (this was after 2 minutes)
image.png
c
we use self-hosted backend (Azure)
Ah well, that changes things. Yeah I don't know if Pulumi does JSON-patch for self-managed backends. I believe they do that for Pulumi Cloud. That allows the CLI to only upload smaller patch-style objects to their API.
is pulumi really uploading the entire state file for each resource update? (this was after 2 minutes)
Yeah it's quite possible due to the CLI communicating with a non-Pulumi Cloud backend.
Also when you say the update is slow, can you see the timings for the resources in your stack and tell if there are certain resources that contribute to the lengthy update time?
> what's one way of doing this you recommend? I've tried printing stuff out (using Python's print, but it gets lost when applying over this many resources)
If you are running the stack up interactively through the CLI, it emits the timing for each resource at the end of the line.
Just curious, have you thought about breaking out some of those resources into other stacks? You can now use stack references with self-managed backends too.
b
yeah, before doing that, would like to understand why it's slow. I'm guessing if we broke it up into other stacks, we could do things in parallel?
If you are running the stack up interactively through the CLI, it emits the timing for each resource at the end of the line.
ah, so looking at some previous results, nothing really sticks out
probably just due to volume of resources it has to go through (and uploading the state file each time I guess)
1
c
yeah, before doing that, would like to understand why it's slow.
Agreed.
I'm guessing if we broke it up into other stacks, we could do things in parallel?
Yes, mostly. You'd want to sequence them in some cases, like when you think there will be changes to stack outputs that downstream stacks might depend on etc.
I am curious if
PULUMI_SKIP_CHECKPOINTS
env var does help in your case since you are using a self-managed backend? Lastly, is the region of your storage account the closest to where you are running the update from, since you mentioned you are running the update manually?
b
Lastly, is the region of your storage account the closest to where you are running the update from, since you mentioned you are running the update manually?
yes
c
Here's the announcement about the skip checkpoints env var: https://www.pulumi.com/blog/pulumi-release-notes-80/#skip-checkpoints-experimental-flag. (Not sure if it's still experimental since that blog post is from two years ago.)
t
You could try to write a trace file and inspect the result: https://www.pulumi.com/docs/support/troubleshooting/#tracing I expect you will see a lot of state writes.
I don't know if Pulumi does JSON-patch for self-managed backends.
No, every operation writes the full file.
420 Views