Hi I m experiencing that `pulumi up` is very slow for a part Pulumi Community #azure

Hi, I'm experiencing that `pulumi up` is very slow...

bumpy-greece-4782

09/03/2024, 3:21 AM

Hi, I'm experiencing that

pulumi up

is very slow for a particular project that I have, which has 1500+ resources while the pulumi state file is over 8MB. It's unclear to me why Pulumi is taking so long (perhaps it's likely making a lot of network calls to the state file). What is the cause of this issue, and how can I speed this up? I've seen some comments online saying to try

Copy code

PULUMI_EXPERIMENTAL=true PULUMI_SKIP_CHECKPOINTS=true

but it's unclear to me what happens if I cancel the update before it completes.

clever-sunset-76585

09/03/2024, 5:22 AM

For a stack that large, it's also possible that you are getting throttled by Azure depending on how often you are running stack updates. Do you also run stack refreshes? Normally turning up verbose logging would allow you to examine the Azure API calls and check if Azure is throttling you, but given the size of the project that's probably not going to be useful. Also when you say the update is slow, can you see the timings for the resources in your stack and tell if there are certain resources that contribute to the lengthy update time? I think you'll need to narrow-down some things as there are a few variables that could be causing the slowness. It could also very well be Pulumi Cloud (depending on where you are running updates from and its connectivity to Pulumi API) contributing to the slowness in some way as well but it's unclear without more info.

clever-sunset-76585

09/03/2024, 5:30 AM

By the way, hitting Ctrl+c the first time signals the Pulumi engine to bail and it tries to gracefully shutdown but a second Ctrl+c will force cancellation and almost certainly will lead to inconsistencies depending on the type of updates being carried out. If you are canceling a running CI, then there's no telling what that'll do. It almost always depends on the type of update being carried out. It's possible that Pulumi has reduced the likelihood of corruption with the frequent checkpoint patches the CLI issues, which I assume would be disabled if you used

PULUMI_SKIP_CHECKPOINTS

. (I don't know what that does but just taking a guess based on its naming.)

bumpy-greece-4782

09/03/2024, 5:11 PM

> For a stack that large, it's also possible that you are getting throttled by Azure depending on how often you are running stack updates. right now, we manually do it, so it's not often (like a couple times a day). In addition, nothing in the blob storage limits could explain why this is so slow, unless each update is downloading the state file locally each time. > Also when you say the update is slow, can you see the timings for the resources in your stack and tell if there are certain resources that contribute to the lengthy update time? what's one way of doing this you recommend? I've tried printing stuff out (using Python's print, but it gets lost when applying over this many resources) > It could also very well be Pulumi Cloud we use self-hosted backend (Azure)

bumpy-greece-4782

09/03/2024, 5:24 PM

is pulumi really uploading the entire state file for each resource update? (this was after 2 minutes)

bumpy-greece-4782

09/03/2024, 5:25 PM

image.png

clever-sunset-76585

09/03/2024, 5:32 PM

we use self-hosted backend (Azure)

Ah well, that changes things. Yeah I don't know if Pulumi does JSON-patch for self-managed backends. I believe they do that for Pulumi Cloud. That allows the CLI to only upload smaller patch-style objects to their API.

is pulumi really uploading the entire state file for each resource update? (this was after 2 minutes)

Yeah it's quite possible due to the CLI communicating with a non-Pulumi Cloud backend.

clever-sunset-76585

09/03/2024, 5:36 PM

Also when you say the update is slow, can you see the timings for the resources in your stack and tell if there are certain resources that contribute to the lengthy update time?

> what's one way of doing this you recommend? I've tried printing stuff out (using Python's print, but it gets lost when applying over this many resources)

If you are running the stack up interactively through the CLI, it emits the timing for each resource at the end of the line.

clever-sunset-76585

09/03/2024, 5:37 PM

Just curious, have you thought about breaking out some of those resources into other stacks? You can now use stack references with self-managed backends too.

bumpy-greece-4782

09/03/2024, 5:37 PM

yeah, before doing that, would like to understand why it's slow. I'm guessing if we broke it up into other stacks, we could do things in parallel?

bumpy-greece-4782

09/03/2024, 5:44 PM

If you are running the stack up interactively through the CLI, it emits the timing for each resource at the end of the line.

ah, so looking at some previous results, nothing really sticks out

bumpy-greece-4782

09/03/2024, 5:45 PM

probably just due to volume of resources it has to go through (and uploading the state file each time I guess)

➕ 1

clever-sunset-76585

09/03/2024, 5:48 PM

yeah, before doing that, would like to understand why it's slow.

Agreed.

I'm guessing if we broke it up into other stacks, we could do things in parallel?

Yes, mostly. You'd want to sequence them in some cases, like when you think there will be changes to stack outputs that downstream stacks might depend on etc.

clever-sunset-76585

09/03/2024, 5:59 PM

I am curious if

PULUMI_SKIP_CHECKPOINTS

env var does help in your case since you are using a self-managed backend? Lastly, is the region of your storage account the closest to where you are running the update from, since you mentioned you are running the update manually?

bumpy-greece-4782

09/03/2024, 6:00 PM

Lastly, is the region of your storage account the closest to where you are running the update from, since you mentioned you are running the update manually?

yes

clever-sunset-76585

09/03/2024, 6:02 PM

Here's the announcement about the skip checkpoints env var: https://www.pulumi.com/blog/pulumi-release-notes-80/#skip-checkpoints-experimental-flag. (Not sure if it's still experimental since that blog post is from two years ago.)

tall-librarian-49374

09/03/2024, 6:57 PM

You could try to write a trace file and inspect the result: https://www.pulumi.com/docs/support/troubleshooting/#tracing I expect you will see a lot of state writes.

I don't know if Pulumi does JSON-patch for self-managed backends.

No, every operation writes the full file.

613 Views

Open in Slack

Previous Next