Hi all, first time here. I joined because I've beg...
# aws
q
Hi all, first time here. I joined because I've begun using Pulumi and I find that when a update has errors, the update process seems to hang forever (30+ min now). I'm using python 3.12, pulumi v3.177.0, pulumi aws 6.82.2, and pulumi-awsx 2.21.1. I've been unable to find info on this behavior anywhere in the docs and elsewhere. The infrastructure i'm trying to provision is attached. The screenshot is what happens when I try to update but an error occurs.
Pulumi code attached
This is what I end up having to do everytime -
ctrl-c
twice.
I'd appreciate any help on this 🙏
l
Do the tasks in ECS ever finish booting and reach healthy state? To me, that error message implies that k8s isn't ready to shut down, and Pulumi has no choice but to keep waiting.
q
In this instance, they didn't because I misconfigured a environment value, so the instance coudln't start up and kept reinstantiating it. But I always encounter this hang for any error, e.g. ECR docker build failing because docker desktop wasn't running, the RDS instance creation failing because I didn't fulfill AWS RDS's password requirements.
l
Well given that it's the ECS service that's causing the problem, it might be worth breaking up your project so that the service is not being updated so frequently. If you group your resources by deployment cycle and have one project per group that need to be updated together, you may find that the problem can be avoided most of the time. For example, a service tends to need deployment a lot more frequently than the cluster in which it lives, so they could be in separate projects. The VPC would likely be deployed only once ever, so it also could be in a smaller project. The DB is harder to say, it depends on your architecture: you may want to recreate your RDS instance each time you redeploy your service, or more likely just each time you deploy your cluster.
q
Noted. I'll try to break up the code. Any insights on the CLI process issue?
l
Just from reading the error message, it looks normal to me. There's a 20 minute timeout waiting for the Service to become stable so that it can be updated, and you're cancelling before the timeout is hit. You could omit updating the service by specifying its ARN to the
--exclude
or
--exclude-dependents
parameters to
pulumi up
.
You may want to consider switching from AWSX to AWS, and defining the resources yourself so that you have more control over the configuration. I'm not au fait with the configurability of Service and TaskDefinition, but there may be something in there that you can change to make it work better in your context.
q
Ok yes that's what I was thinking as well. Good to know.