Help tracing a slow Pulumi Up We re trying to understand an Pulumi Community #general

Help tracing a slow Pulumi Up…. We’re trying to u...

great-sundown-78827

04/17/2024, 1:02 AM

Help tracing a slow Pulumi Up…. We’re trying to understand an extremely slow

pulumi up…

run. When we run

pulumi refresh

it takes ~1m or so to get the data back - and that feels reasonable, given that we have a few hundred resources. When we then run a

pulumi up

where even only 1 resource changes, the run takes over 9 minutes. The primary API we are interfacing with is the PagerDuty API using the Pulumi PagerDuty provider. I’ve run

pulumi up --tracing…

and I have some trace files. Using the

AppDash

is a little confusing though and I am not sure how to really get much use out of it. The one thing I see is the

Profile View

at the bottom and sorting by `Cumulative Time(ms)`: When I do that, I get a MASSIVE output of

RegisterResource

calls, … like … thousands and thousands of lines. We don’t have thousands of resources. There are over 11,500 lines in this output. There are 932

RegisterResourceOutputs

calls… 1864
RegisterResource
calls..

Copy code

%  cat pulumi.cumulative.out | awk '{print $1}' | sort | uniq -c | sort -n 
   1 /pulumirpc.Engine/SetRootResource
   1 execNodejs
   1 locateModule
   1 pf.CheckConfig
   1 pf.Configure
   1 pf.ValidateProviderConfig
   1 pulumi-plan
   1 sdkv2.CheckConfig
   1 sdkv2.Configure
   1 sdkv2.GetPluginInfo
   1 sdkv2.ValidateProviderConfig
   2 /pulumirpc.LanguageRuntime/GetPluginInfo
   2 /pulumirpc.LanguageRuntime/GetRequiredPlugins
   2 /pulumirpc.LanguageRuntime/Run
   2 Cumulative
   2 Name
   3 newPlugin
   4 /pulumirpc.ResourceProvider/CheckConfig
   4 /pulumirpc.ResourceProvider/Configure
   4 /pulumirpc.ResourceProvider/DiffConfig
   4 /pulumirpc.ResourceProvider/GetPluginInfo
   4 Time
  12 /pulumirpc.ResourceMonitor/SupportsFeature
 130 sdkv2.Invoke
 260 /pulumirpc.ResourceMonitor/Invoke
 260 /pulumirpc.ResourceProvider/Invoke
 459 sdkv2.Check
 459 sdkv2.Diff
 928 /pulumirpc.ResourceProvider/Check
 928 /pulumirpc.ResourceProvider/Diff
 932 /pulumirpc.ResourceMonitor/RegisterResourceOutputs
1864 /pulumirpc.ResourceMonitor/RegisterResource
5240 /pulumirpc.Engine/Log

I don’t see much detail though in the trace view… is there something I am missing in order to figure out why we see so many calls? Also, it feels like the calls get cumulatively slower as we go. While looking at the

--debug --verbose=7

logs, I would swear that the

pulumi up

command is making S3 calls after each and every resource update. Is that possible?

great-sundown-78827

04/17/2024, 1:11 AM

Looking through the output for example, if we grep for

state.go

and a noisy string

error copying

, we get this:

Copy code

I0416 17:55:02.213363   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist.
I0416 17:55:03.660768   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist.
I0416 17:55:04.526660   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist.
I0416 17:55:05.615803   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist....
I0416 18:03:56.379596   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist.
I0416 18:03:57.366905   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist.
I0416 18:03:59.225002   33154 state.go:342] error copying .pulumi/stacks/ourapp/production.json.gz to .pulumi/stacks/ourapp/production.json.gz.bak: blob (key ".pulumi/stacks/ourapp/production.json.gz -> .pulumi/stacks/ourapp/production.json.gz.bak") (code=NotFound): NoSuchKey: The specified key does not exist.

great-sundown-78827

04/17/2024, 1:11 AM

you can see how that goes on for 9 minutes.. there are over 450 lines like that.

echoing-dinner-19531

04/17/2024, 7:36 AM

I would swear that the
pulumi up
command is making S3 calls after each and every resource update

Yeh you might want to enable checkpoint skipping https://www.pulumi.com/blog/pulumi-release-notes-80/#skip-checkpoints-experimental-flag I don't think this needs EXPERIMENTAL set anymore, just

PULUMI_SKIP_CHECKPOINTS=true

, will skip all but the final checkpoint write so should be much faster at the cost that if something causes your pulumi process to be hard-killed halfway through you lose all track of what had been done so far.

great-sundown-78827

04/17/2024, 1:05 PM

Ok thank you - if we do lose track - a pulumi refresh/up usually should be fine, right?

great-sundown-78827

04/17/2024, 1:10 PM

Copy code

Resources:
    ~ 1 updated
    933 unchanged

Duration: 9s

^^ that made a difference!

echoing-dinner-19531

04/17/2024, 1:55 PM

yeh refresh will normally fix up most things, but like if things started to create and then you killed pulumi it'll have totally lost track that it was even trying to create something

better-lunch-24345

04/18/2024, 1:37 PM

This is how much is taking from me using GCS Bucket. So the only way to speed up is by using this

PULUMI_SKIP_CHECKPOINTS=true

Copy code

Resources:
    ~ 1 to update
    616 unchanged

Do you want to perform this update? yes
Updating (live):
     Type                              Name                                   Status            Info
     pulumi:pulumi:Stack               upcoupons-live                                           
     └─ gcp:utils:UpfeatAlert          live-alert-backend-requests-component                    
 ~      └─ gcp:monitoring:AlertPolicy  live-alert-backend-requests            updated (12s)     [diff: ~conditions]

Resources:
    ~ 1 updated
    616 unchanged

Duration: 6m40s

95 Views

Open in Slack

Previous Next