This message was deleted Pulumi Community #general

Join Slack

This message was deleted.

# general

sparse-intern-71089

03/03/2023, 11:27 PM

This message was deleted.

cuddly-computer-18851

03/04/2023, 12:06 AM

Several minutes for a

preview

is common for our projects. Running pulumi as close to the resources in question ( in our case AWS + EKS ) to reduce roundtrip latency has the biggest impact to reduce time.

breezy-lawyer-65638

03/04/2023, 12:21 AM

thank you for confirming! glad it's not just me. it just seems kind of bizarre because while I can understand

refresh

taking a lot of time (since it is updating state), I would have thought

preview

ought to be fast (since it could just compare local state, not sure if it does.) it kind of looks to me from this trace and some logs that it's registering resources one at a time sequentially, but not sure if I am reading it correctly.

breezy-lawyer-65638

03/04/2023, 12:25 AM

@cuddly-computer-18851 what language are you using? I'm using Python

cuddly-computer-18851

03/04/2023, 12:26 AM

Typescript. They both handover work to the same backend ( at least for AWS, maybe k8s is different but ). Do you run

preview

with --refresh ? Also you can set the concurrency of requests, which is unlimited by default, but this can sometimes be slower if its causing rate-limiting at the k8s API.

breezy-lawyer-65638

03/04/2023, 12:29 AM

ah was wondering if there's some weird async badness in the python lib, but if you're seeing the same thing with TS then nevermind no this is without --refresh, just

pulumi preview

I have been setting

KubeClientSettingsArgs

, started with medium values, tried very high values, will try unset now. what are the other concurrency knobs?

cuddly-computer-18851

03/04/2023, 12:30 AM

-p, --parallel int                          Allow P resource operations to run in parallel at once (1 for no parallelism). Defaults to unbounded. (default 2147483647)

🙌 1

billowy-army-68599

03/04/2023, 12:36 AM

Where are you storing state?

breezy-lawyer-65638

03/04/2023, 12:37 AM

S3, was about to trying to pull the file locally

breezy-lawyer-65638

03/04/2023, 12:43 AM

no change with local file, trying to get a network trace with mitmproxy but struggling with certificate problems

cuddly-computer-18851

03/04/2023, 12:47 AM

Another thing which can make pulumi very slow is if you're running it in a docker container, but I think that's mostly a Typescript issue due to loading a million node_modules, and specifically on OSX

breezy-lawyer-65638

03/04/2023, 12:52 AM

hitting this with Linux on an EC2 machine unfortunately

cuddly-computer-18851

03/04/2023, 1:04 AM

How are you deploying most of your k8s resources, directly or w/ helm charts?

breezy-lawyer-65638

03/04/2023, 1:05 AM

directly, no helm

billowy-army-68599

03/04/2023, 1:06 AM

What size ec2 instance are you using?

cuddly-computer-18851

03/04/2023, 1:07 AM

We've certainly found the newer

Release

resource type is much better than the older

Chart

resource, but not an issue here I guess. Ultimately after doing a huge k8s project w/ pulumi, I would not use it again. The double handling of k8s state just doesn't work too often.

breezy-lawyer-65638

03/04/2023, 1:08 AM

r6i.xlarge instance

breezy-lawyer-65638

03/04/2023, 1:08 AM

yeah that's been a whole other issue haha. but we have a minimal amount of bring-up to create RBAC users and whatnot I was hoping to make work

breezy-lawyer-65638

03/04/2023, 1:17 AM

yeah preview is blocking on a lot of the requests to the clusters. if I blackhole all the connection attempts then it runs basically instantly

breezy-lawyer-65638

03/04/2023, 1:27 AM

I am guessing this is partly pebkac: since preview generates high QPS to the clusters, still trying to verify but I think the connection to the cluster is too slow (as Baz suggested) though I am still surprised preview generates queries to the cluster given I am not using --refresh. maybe because of SSA? but that seems to be the preferred default of pulumi-kubernetes

billowy-army-68599

03/04/2023, 1:29 AM

You can configure that: https://www.pulumi.com/registry/packages/kubernetes/api-docs/provider/#kubeclientsettings

billowy-army-68599

03/04/2023, 1:29 AM

Up the QPS and it’ll likely speed up

breezy-lawyer-65638

03/04/2023, 1:29 AM

I've set it to like 100, I think it's the network connection but still trying to verify

breezy-lawyer-65638

03/04/2023, 1:59 AM

I really do think it's entering Diff in the pulumi-kubernetes plugin sequentially. I changed some logging and am getting

breezy-lawyer-65638

03/04/2023, 2:13 AM

that log was hard to read, here's a simpler trace from step_generator.go:generateSteps. totally sequential

breezy-lawyer-65638

03/04/2023, 2:27 AM

so as far as I can tell, the step generator processes one event a time. since resource registration is an event, and it requires diffing in the provider, it is processed one at a time. the QPS (in pulumi-k8s) and parallelism (-p) controls have no effect here

breezy-lawyer-65638

03/04/2023, 2:29 AM

as someone who knows the least, it seems like pulumi-kubernetes breaks the assumption baked into the pulumi side logic: diffing is assumed to be near instant. either step generation should allow processing events in parallel or pulumi-k8s shouldn't block in Diff

breezy-lawyer-65638

03/04/2023, 3:32 AM

here's a (potentially unsafe?) patch to pulumi that makes it run in 30 seconds instead of 3 minutes. I don't know golang so please forgive me

Copy code

diff --git a/pkg/resource/deploy/deployment_executor.go b/pkg/resource/deploy/deployment_executor.go
index cf6b3738d..1337ffcbf 100644
--- a/pkg/resource/deploy/deployment_executor.go
+++ b/pkg/resource/deploy/deployment_executor.go
@@ -19,6 +19,7 @@ import (
 	"errors"
 	"fmt"
 	"strings"
+	"sync"

 	"<http://github.com/pulumi/pulumi/pkg/v3/resource/deploy/providers|github.com/pulumi/pulumi/pkg/v3/resource/deploy/providers>"
 	"<http://github.com/pulumi/pulumi/pkg/v3/resource/graph|github.com/pulumi/pulumi/pkg/v3/resource/graph>"
@@ -38,6 +39,7 @@ type deploymentExecutor struct {

 	stepGen  *stepGenerator // step generator owned by this deployment
 	stepExec *stepExecutor  // step executor owned by this deployment
+	workers  sync.WaitGroup
 }

 // checkTargets validates that all the targets passed in refer to existing resources.  Diagnostics
@@ -271,6 +273,7 @@ func (ex *deploymentExecutor) Execute(callerCtx context.Context, opts Options, p
 		}
 	}()

+	ex.workers.Wait()
 	ex.stepExec.WaitForCompletion()
 	logging.V(4).Infof("deploymentExecutor.Execute(...): step executor has completed")

@@ -411,25 +414,34 @@ func (ex *deploymentExecutor) performDeletes(
 func (ex *deploymentExecutor) handleSingleEvent(event SourceEvent) result.Result {
 	contract.Requiref(event != nil, "event", "must not be nil")

-	var steps []Step
-	var res result.Result
 	switch e := event.(type) {
-	case RegisterResourceEvent:
-		logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received RegisterResourceEvent")
-		steps, res = ex.stepGen.GenerateSteps(e)
-	case ReadResourceEvent:
-		logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received ReadResourceEvent")
-		steps, res = ex.stepGen.GenerateReadSteps(e)
 	case RegisterResourceOutputsEvent:
 		logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received register resource outputs")
 		return ex.stepExec.ExecuteRegisterResourceOutputs(e)
 	}

-	if res != nil {
-		return res
-	}
+	ex.workers.Add(1)
+	go func() {
+		defer ex.workers.Done()
+
+		var steps []Step
+		var res result.Result
+		switch e := event.(type) {
+		case RegisterResourceEvent:
+			logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received RegisterResourceEvent")
+			steps, res = ex.stepGen.GenerateSteps(e)
+		case ReadResourceEvent:
+			logging.V(4).Infof("deploymentExecutor.handleSingleEvent(...): received ReadResourceEvent")
+			steps, res = ex.stepGen.GenerateReadSteps(e)
+		}
+
+		if res != nil {
+			return
+		}
+
+		ex.stepExec.ExecuteSerial(steps)
+	}()

-	ex.stepExec.ExecuteSerial(steps)
 	return nil
 }

@@ -453,6 +465,7 @@ func (ex *deploymentExecutor) importResources(
 		preview:    preview,
 	}
 	res := importer.importResources(ctx)
+	ex.workers.Wait()
 	stepExec.SignalCompletion()
 	stepExec.WaitForCompletion()

@@ -503,6 +516,7 @@ func (ex *deploymentExecutor) refresh(callerCtx context.Context, opts Options, p
 	ctx, cancel := context.WithCancel(callerCtx)
 	stepExec := newStepExecutor(ctx, cancel, ex.deployment, opts, preview, true)
 	stepExec.ExecuteParallel(steps)
+	ex.workers.Wait()
 	stepExec.SignalCompletion()
 	stepExec.WaitForCompletion()

billowy-army-68599

03/04/2023, 3:35 AM

I would open an issue for that, with the suggested change. That’s really appreciated

🙌 1

breezy-lawyer-65638

03/04/2023, 4:01 AM

https://github.com/pulumi/pulumi/issues/12351

cuddly-computer-18851

03/04/2023, 4:54 AM

I guess since this is in the core pulumi package, this actually effects all providers not just k8s?

breezy-lawyer-65638

03/04/2023, 7:16 AM

yes, but my guess is most don't block in Diff. I saw a drop from 70s to 50s for a project that is more weighted towards AWS/GCP resources and less k8s

👍 1

9 Views

Open in Slack

Previous Next