hello, I have a project with 489 resources (mostly...
# general
b
hello, I have a project with 489 resources (mostly pulumi-kubernetes) that takes ~3 minutes just to run
preview
. is this expected? I took a trace and it has a lot of stuff like this
c
Several minutes for a
preview
is common for our projects. Running pulumi as close to the resources in question ( in our case AWS + EKS ) to reduce roundtrip latency has the biggest impact to reduce time.
b
thank you for confirming! glad it's not just me. it just seems kind of bizarre because while I can understand
refresh
taking a lot of time (since it is updating state), I would have thought
preview
ought to be fast (since it could just compare local state, not sure if it does.) it kind of looks to me from this trace and some logs that it's registering resources one at a time sequentially, but not sure if I am reading it correctly.
@cuddly-computer-18851 what language are you using? I'm using Python
c
Typescript. They both handover work to the same backend ( at least for AWS, maybe k8s is different but ). Do you run
preview
with --refresh ? Also you can set the concurrency of requests, which is unlimited by default, but this can sometimes be slower if its causing rate-limiting at the k8s API.
b
ah was wondering if there's some weird async badness in the python lib, but if you're seeing the same thing with TS then nevermind no this is without --refresh, just
pulumi preview
I have been setting
KubeClientSettingsArgs
, started with medium values, tried very high values, will try unset now. what are the other concurrency knobs?
c
-p, --parallel int                          Allow P resource operations to run in parallel at once (1 for no parallelism). Defaults to unbounded. (default 2147483647)
b
Where are you storing state?
b
S3, was about to trying to pull the file locally
no change with local file, trying to get a network trace with mitmproxy but struggling with certificate problems
c
Another thing which can make pulumi very slow is if you're running it in a docker container, but I think that's mostly a Typescript issue due to loading a million node_modules, and specifically on OSX
b
hitting this with Linux on an EC2 machine unfortunately
c
How are you deploying most of your k8s resources, directly or w/ helm charts?
b
directly, no helm
b
What size ec2 instance are you using?
c
We've certainly found the newer
Release
resource type is much better than the older
Chart
resource, but not an issue here I guess. Ultimately after doing a huge k8s project w/ pulumi, I would not use it again. The double handling of k8s state just doesn't work too often.
b
r6i.xlarge instance
yeah that's been a whole other issue haha. but we have a minimal amount of bring-up to create RBAC users and whatnot I was hoping to make work
yeah preview is blocking on a lot of the requests to the clusters. if I blackhole all the connection attempts then it runs basically instantly
I am guessing this is partly pebkac: since preview generates high QPS to the clusters, still trying to verify but I think the connection to the cluster is too slow (as Baz suggested) though I am still surprised preview generates queries to the cluster given I am not using --refresh. maybe because of SSA? but that seems to be the preferred default of pulumi-kubernetes
b
Up the QPS and it’ll likely speed up
b
I've set it to like 100, I think it's the network connection but still trying to verify
I really do think it's entering Diff in the pulumi-kubernetes plugin sequentially. I changed some logging and am getting
that log was hard to read, here's a simpler trace from step_generator.go:generateSteps. totally sequential
so as far as I can tell, the step generator processes one event a time. since resource registration is an event, and it requires diffing in the provider, it is processed one at a time. the QPS (in pulumi-k8s) and parallelism (-p) controls have no effect here
as someone who knows the least, it seems like pulumi-kubernetes breaks the assumption baked into the pulumi side logic: diffing is assumed to be near instant. either step generation should allow processing events in parallel or pulumi-k8s shouldn't block in Diff
here's a (potentially unsafe?) patch to pulumi that makes it run in 30 seconds instead of 3 minutes. I don't know golang so please forgive me
Copy code
diff --git a/pkg/resource/deploy/deployment_executor.go b/pkg/resource/deploy/deployment_executor.go
index cf6b3738d..1337ffcbf 100644
--- a/pkg/resource/deploy/deployment_executor.go
+++ b/pkg/resource/deploy/deployment_executor.go
@@ -19,6 +19,7 @@ import (
 	"errors"
 	"fmt"
 	"strings"
+	"sync"

 	"<http://github.com/pulumi/pulumi/pkg/v3/resource/deploy/providers|github.com/pulumi/pulumi/pkg/v3/resource/deploy/providers>"
 	"<http://github.com/pulumi/pulumi/pkg/v3/resource/graph|github.com/pulumi/pulumi/pkg/v3/resource/graph>"
@@ -38,6 +39,7 @@ type deploymentExecutor struct {

 	stepGen  *stepGenerator // step generator owned by this deployment
 	stepExec *stepExecutor  // step executor owned by this deployment
+	workers  sync.WaitGroup
 }

 // checkTargets validates that all the targets passed in refer to existing resources.  Diagnostics
@@ -271,6 +273,7 @@ func (ex *deploymentExecutor) Execute(callerCtx context.Context, opts Options, p
 		}
 	}()

+	ex.workers.Wait()
 	ex.stepExec.WaitForCompletion()
 	logging.V(4).Infof("deploymentExecutor.Execute(...): step executor has completed")

@@ -411,25 +414,34 @@ func (ex *deploymentExecutor) performDeletes(
 func (ex *deploymentExecutor) handleSingleEvent(event SourceEvent) result.Result {
 	contract.Requiref(event != nil, "event", "must not be nil")

-	var steps []Step
-	var res result.Result
 	switch e := event.(type) {
-	case RegisterResourceEvent:
-		logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received RegisterResourceEvent")
-		steps, res = ex.stepGen.GenerateSteps(e)
-	case ReadResourceEvent:
-		logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received ReadResourceEvent")
-		steps, res = ex.stepGen.GenerateReadSteps(e)
 	case RegisterResourceOutputsEvent:
 		logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received register resource outputs")
 		return ex.stepExec.ExecuteRegisterResourceOutputs(e)
 	}

-	if res != nil {
-		return res
-	}
+	ex.workers.Add(1)
+	go func() {
+		defer ex.workers.Done()
+
+		var steps []Step
+		var res result.Result
+		switch e := event.(type) {
+		case RegisterResourceEvent:
+			logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received RegisterResourceEvent")
+			steps, res = ex.stepGen.GenerateSteps(e)
+		case ReadResourceEvent:
+			logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received ReadResourceEvent")
+			steps, res = ex.stepGen.GenerateReadSteps(e)
+		}
+
+		if res != nil {
+			return
+		}
+
+		ex.stepExec.ExecuteSerial(steps)
+	}()

-	ex.stepExec.ExecuteSerial(steps)
 	return nil
 }

@@ -453,6 +465,7 @@ func (ex *deploymentExecutor) importResources(
 		preview:    preview,
 	}
 	res := importer.importResources(ctx)
+	ex.workers.Wait()
 	stepExec.SignalCompletion()
 	stepExec.WaitForCompletion()

@@ -503,6 +516,7 @@ func (ex *deploymentExecutor) refresh(callerCtx context.Context, opts Options, p
 	ctx, cancel := context.WithCancel(callerCtx)
 	stepExec := newStepExecutor(ctx, cancel, ex.deployment, opts, preview, true)
 	stepExec.ExecuteParallel(steps)
+	ex.workers.Wait()
 	stepExec.SignalCompletion()
 	stepExec.WaitForCompletion()
b
I would open an issue for that, with the suggested change. That’s really appreciated
b
c
I guess since this is in the core pulumi package, this actually effects all providers not just k8s?
b
yes, but my guess is most don't block in Diff. I saw a drop from 70s to 50s for a project that is more weighted towards AWS/GCP resources and less k8s