hey, I have a stack with over 100 ressources (most...
# general
e
hey, I have a stack with over 100 ressources (mostly k8s stuff) .
pulumi up --yes
without any changes takes around 2-5min, which is not acceptable. Im trying to debug the performance with
pulumi up --tracing=file:./trace --yes
. The resulting file is 170mb in size. When using
PULUMI_DEBUG_COMMANDS=1 pulumi view-trace trace
the UI is very bugy displaying all the traces. Then I tried
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -p 16686:16686 -p 9411:9411 jaegertracing/all-in-one:1.22
and
pulumi up --yes --tracing=<http://localhost:9411/api/v1/spans>
but according to
sudo tcpdump -i any port  9411
pulumi is not sending any traffic to port 9411 (
curl localhost:9411
shows traffic). 1.) Whats wrong with http tracing? 2.) Is this performance expected?
b
are you using an OSS backend? We've had lots of folks manage to get traces out correctly, I'll try repro when the long weekend is over
e
I used the gcp bucket backend. Switched to local file system and now its down to 7s. Was missing localhost in the
hosts
file, which seem to have caused the tracing issues......
/pulumirpc.ResourceMonitor/RegisterResource
seems to be the performance issue in the gcp bucket backend. There are some funky operations, but at least I have a starting point. Will also look into the hosted servicep rocket.
h
if you work with typescript, this project might be interesting for you: https://www.npmjs.com/package/kubernate for disclosure, I am the author of this library and I've just published it this week 🙂 it was received very well tho'; I had the same issues as you with performance and that was one of my main objectives with this
c
isnt pulumi also capable to generate yaml files somehow? https://www.pulumi.com/docs/guides/adopting/from_kubernetes/#rendering-kubernetes-yaml I actually like the runtime dag and dependencies by promises pulumi uses. Nevertheless agree to the statement that a second state on top of kubernetes is “weird”. Maybe it makes sense to add the kubernetes api as state backend. Also agree to the
ImagePullBackOff
waiting issue. How about making timeouts for error configurable? What are “feature-branch preview environment”?
h
Yes, you can render resources as YAML files, but they are still included in the state of the stack and it becomes noisy very fast. (I used to generate YAML for the seed scripts that are only being ran as a Job on demand and I had to commit them in the repository along with the repository, otherwise it would fail at deleting them when they had to be replaced). By "feature-branch preview environment" I mean that for every branch in my project I can create an environment that runs that version of the code (it builds the images and creates a namespace and all the required deployments, services, the ingress, etc. for that specific commit) that can then be opened in the browser to be previewed (and even tested) before it gets merged into the main branch. After the merge, I delete this environment to make room for another one. I have 10-20 of there environments at any given time, sometimes even more (my development team has about 15 people, quite large) so the resources that Pulumi has to manage only for these environments are quite many (around 1200 at the moment).
c
I guess thats on the hosted backend, whats your experience with the performance for a
pulumi up
with no changes? BTW Its also possible to split a stack up by e.g.
pulumi.StackReference("someStack").getOutput("kubeconfig");
h
With no changes it is better but still pretty bad 😕
I could split the stack, but then I would have to know in an external program (the script that runs the deployment) what stacks need to be updated depending on what the changes are
And... why would I do this?
Logical it would be to have one stack for my feature branch deployments (if you reason about the different components of my deployment), but performance (and the other things I pointed out in the README) makes it a pain.
b
@high-answer-18213 thanks for sharing your library. It would be appreciated if we could keep the thread on topic, and not use this as an opportunity to promote your own work
b
We've also been hit by long runtimes when the state grows (especially the CRDs by cert-manager are huge) with the backend on s3. Because of this we're wrapping pulumi with a script that syncs the state to a local file, runs pulumi locally and syncs back up (combined with locking in dynamodb).
👀 1
e
thats a good one 😄 @billowy-army-68599 Whats your opinion on all these remote
RegisterResource
calls on the bucket? AFAIK the open source backend deletes the stack.json during
pulumi up
anyways and is not capable of parallel
pulumi up
s
b
RegisterResource is writing the current state of the resource from the engine, it’s very very latency sensitive