Hi. I’m trying to destroy a stack but there is a s...
# aws
l
Hi. I’m trying to destroy a stack but there is a security group that has dependants. This causes the program to stall for a looooooong time. Can’t I put a timeout on this? And how can I resolve the issue?
There is also a problem, when starting an ECS service with for example the wrong docker image. If it can’t find the docker image, it will keep trying for probably 15min.
And if I cancel the deployment mid way using
pulumi cancel
, everything gets fucked, and the next time I run
pulumi up
it tries to create a whole new cluster, service and everything
g
keep in mind, pulumi cancel does not stop ecs deployment, you should use circuit breaker but it has catch (you need at least 2 containers) and other issues https://github.com/aws/containers-roadmap/issues/1247
as much as awsx looks like a shortcut it is very opinionated and can break a lot of things, I prefer writing my own components. ATM I am migrating huge portion of AWSX code to aws classic
l
Damn. That’s bad news. I might be better off just manually creating the security groups, load balancers, etc. The goal was to just get ECS hosting with deployments from github actions working ASAP.
This health check fails for some reason, and yet the pulumi deployment keeps waiting. Surely there is a solution to this problem…
image.png
g
ECS and ASAP do not mix well. Are you sure it is container healthcheck failing (one you provided in the screenshot) and not LB health check? Because service has to provide a response to LB so that it does not get deregistered from TG
l
I don’t see any reason why the health check would fail. I was quite stumped. Makes more sense if it’s the LB. Any other way you know of that would result in an unhealthy status? I don’t know why it wouldn’t respond to the LB though.
Really, the biggest issue is not being able to have a reasonable timeout. It makes it impossible to iterate on the system design.
I may just give up on Pulumi for ECS deployments all together, but if Pulumi Classic can be configured with timeouts (seconds, not minutes), then I might still consider it a viable option.
g
depends on what is your goal, generally any ECS deployment takes good 5 minutes due to rolling updates and stuff. It is possible to make it slightly faster - but you do not want to rely on ECS to tell you application-level problems because it is painfully slow you can achieve a bit faster results with these but to really tweak the deployment, you have to understand all moving parts between ECS and LB and time it takes your application to become responsive to the first request.
Copy code
healthCheck: {
          path: args.healthCheckPath,
          timeout: args.healthCheckTimeoutSeconds,
          interval: args.healthCheckIntervalSeconds,
        },
        deregistrationDelay: args.deregistrationDelay,
also you could drop wait for steady state or configure timeous but I do not know how, however they were added here: https://github.com/hashicorp/terraform-provider-aws/pull/25641 I am not sure what you are working on but you may get faster dev cycle with AppRunner or GCP CloudRun
l
AppRunner looks good for this kind of work, actually. I will consider it. Thanks a lot for the help!
g
Apprunner is pretty straight forward however it has serious limitation that you can only have a single container, not a Task/Pod then it looks something like this
Copy code
const appRunner = new aws.apprunner.Service(
    "svc",
    {
      serviceName: "svc",
      sourceConfiguration: {
        autoDeploymentsEnabled: false,
        imageRepository: {
          imageRepositoryType: "ECR_PUBLIC",
          imageConfiguration: {
            runtimeEnvironmentSecrets: { ...pars },
            runtimeEnvironmentVariables: {
              SYSTEM_REQUIREMENT_CHECK_ENABLED: "false",
              ALPINE_DATABASE_MODE: "external",
              ALPINE_DATABASE_DRIVER: "org.postgresql.Driver",
              LOGGING_LEVEL: "INFO",
            },
            port: "8080",
          },
          imageIdentifier: args.imageTag,
        },
      },
      instanceConfiguration: {
        cpu: "4 vCPU",
        memory: "8 GB",
        instanceRoleArn: instanceRole.arn,
      },
      networkConfiguration: {
        egressConfiguration: {
          egressType: "VPC",
          vpcConnectorArn: connector.arn,
        },
      },
    },
    { dependsOn: [...secrets] }
  );
l
Seems to have decent auto scaling capabilities though… https://docs.aws.amazon.com/apprunner/latest/dg/manage-autoscaling.html And can be scaled across multiple availability zones https://aws.amazon.com/blogs/containers/architecting-for-resiliency-on-aws-app-runner/ If there aren’t 2 or more tasks, how does it then handle failed deployments? Surely an old instance will keep running and accepting traffic until a new instance becomes healthy?