Hello, Our team is experiencing something odd that...
# general
e
Hello, Our team is experiencing something odd that we feel shouldn't be happening. We're not sure if it is something we are doing wrong or a limitation from pulumi (though we're pretty sure it's the former). We have multiple people in our team doing deployments and have noticed that when a 'new' developer deploys (meaning when the deployment is happening from a different machine than the previous deploment), then the deployment takes a very long time. For example, if I had previously deployed to our production environment, and the new deployment only has about 7-8 lambdas that have changed. It will take me a few minutes to deploy. However, if a different developer in my team had previously deployed to production, and I am deploying, it will deploy every resource we have again and will take a significantly long time. It takes approximately 1.5 to 2 hours on average and last week (on a day when we were deploying to our staging environment) it for some reason took 6 hours and 41 minutes to deploy. We have about 800 resource (most of them lambda functions and aliases). We're not sure why this is happening, but understand that this is not ideal (deployments shouldn't be taking so long and if a similar issue were to happen in production, it means that our systems are down for a whole work day. Why could this be happening? What are some factors that could affect it? Is there a way to make sure this doesn't happen if a new machine is deploying? How can we make sure that happens? Would appreciate anyone to help and am willing to share snippets or more information about how we're setting pulumi up. Thank you 🙂
b
you’re using the object store backend too right? 🙂
e
Maybe file paths changes causing the provider to think everything has diff'd and needs an update?
h
are you using pulumiSaaS or something else to coordinate state? and are you all logged into the same place? I'd have each person run
pulumi whoami -v
and make sure the backend url is the same all around.
e
Thanks for looking into it guys. @hallowed-shoe-53735 I am not using pulumi Saas. My state files are stored inside an s3 bucket. We all login to the same place using
Copy code
export PULUMI_CONFIG_PASSPHRASE="<SECRET_PASSPHRASE_HERE>"
pulumi login s3://<URL>?region=us-east-1
We've confirmed that upon using
pulumi whoami -v
we get the same backend url
e
e
Thanks @echoing-dinner-19531. I've posted my issues here
Does anyone have any idea why this could be happening?
b
yes, you’re using the S3 backend which is notoriously bad at doing lots of small objects https://github.com/pulumi/pulumi/issues/15218 https://github.com/pulumi/pulumi/issues/14967 https://github.com/pulumi/pulumi/issues/10057 https://github.com/pulumi/pulumi/issues/8872 Basically, it checkpoints every resource multiple times per operation which includes a roundtrip between the client and the resource for every single one. Add all those up and it’ll take a long time You have 800 resources. Pulumi runs checkpoints multiple times for each resource. An s3 request involves a read, lock, write, unlock which has a round trip latency of around 150ms to S3. Thats 2hours just in checkpoints back and forth to S3. You should really focus on breaking up your stack into smaller chunks