Hey guys, everytime I try to update my ECS stack I...
# general
a
Hey guys, everytime I try to update my ECS stack I get a timeout after 10 minutes. Doesn't matter how I set it up, the initial run goes well but every update after results in a timeout:
Copy code
Diagnostics:
  aws:ecs:Service (django-app-worker):
    error: Plan apply failed: 1 error occurred:

    * updating urn:pulumi:backend-dev::django-deploy::cloud:service:Service$aws:ecs/service:Service::django-app-worker: timeout while waiting for state to become 'true' (last state: 'false', timeout: 10m0s)

  aws:ecs:Service (django-app):
    error: Plan apply failed: 1 error occurred:

    * updating urn:pulumi:backend-dev::django-deploy::cloud:service:Service$aws:ecs/service:Service::django-app: timeout while waiting for state to become 'true' (last state: 'false', timeout: 10m0s)
Any idea where to start debugging this issue?
w
If you can share your code - I'd be happy to take a look. But for debugging yourself, I'd suggest looking in the ECS console on the AWS console, and looking at the Events tab for the ECS Service in question. It's most likely something like the Tasks crashing repeatedly, or not being able to place new Tasks, etc. The information in the Events tab should help here. We hope to be able expose some of this information through status events in Pulumi in the near future to ease this diagnostics/debugging experience.
a
Thanks so much, the Events tab helped a lot. I'm getting
service django-app-e087161 was unable to place a task because no container instance met all of its requirements. The closest matching container-instance c02faa85-e355-43c2-af55-925ffeba89ce encountered error "RESOURCE:ENI"
with the following relevant config:
Copy code
cloud-aws:acmCertificateARN               arn:foo
cloud-aws:ecsAutoCluster                         true
cloud-aws:ecsAutoClusterInstanceType             t2.small
cloud-aws:ecsAutoClusterMinSize                  3
and my code looks like this:
Copy code
const cloud = require("@pulumi/cloud-aws");

const REDIS_URL = "foo";
const DATABASE_URL = "foo";
const AWS_ACCESS_KEY_ID = "bar";
const AWS_SECRET_ACCESS_KEY = "bar";

const dockerArgs = {
  DATABASE_URL: DATABASE_URL,
  AWS_ACCESS_KEY_ID: AWS_ACCESS_KEY_ID,
  AWS_SECRET_ACCESS_KEY: AWS_SECRET_ACCESS_KEY
};

let service = new cloud.Service("django-app", {
  containers: {
    django: {
      build: {
        context: "../",
        args: dockerArgs
      },
      memoryReservation: 128,
      ports: [{ port: 443, protocol: "https", targetPort: 8000 }],
      // ports: [{ port: 8000 }],
      environment: {
        DATABASE_URL: DATABASE_URL,
        CELERY_BROKER_URL: REDIS_URL,
        CELERY_RESULT_BACKEND: REDIS_URL,
        AWS_ACCESS_KEY_ID: AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY: AWS_SECRET_ACCESS_KEY
      }
    }
  }
  // replicas: 2
});

let workerService = new cloud.Service("django-app-worker", {
  containers: {
    celery: {
      build: {
        context: "../",
        args: dockerArgs
      },
      command: ["celery", "-A", "app", "worker", "-l", "info"],
      memoryReservation: 256,
      environment: {
        DATABASE_URL: DATABASE_URL,
        CELERY_BROKER_URL: REDIS_URL,
        CELERY_RESULT_BACKEND: REDIS_URL,
        AWS_ACCESS_KEY_ID: AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY: AWS_SECRET_ACCESS_KEY
      }
    }
  }
  // replicas: 2
});

// export just the hostname property of the container frontend
exports.url = service.defaultEndpoint.apply(e => `http://${e.hostname}`);
w
Ahh - yes - this is an unfortunate ECS limitation. When using the
awsvpc
networking mode, each task needs an ENI, and EC2 instances have very few ENIs. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html:
Each task that uses the awsvpc network mode receives its own elastic network interface, which is attached to the container instance that hosts it. EC2 instances have a limit to the number of elastic network interfaces that can be attached to them, and the primary network interface counts as one. For example, a c4.large instance may have up to three elastic network interfaces attached to it. The primary network adapter for the instance counts as one, so you can attach two more elastic network interfaces to the instance. Because each awsvpc task requires an elastic network interface, you can only run two such tasks on this instance type. For more information about how many elastic network interfaces are supported per instance type, see IP Addresses Per Network Interface Per Instance Type in the Amazon EC2 User Guide for Linux Instances.
So if you are running many very "small" tasks, this sometimes forces your to have a larger cluster than you would like. The reason this is likely causing you trouble during updates is that you need room for a second instance of your Task during the update.
a
weirdly I just have two tasks and 3 t2.small instances
would you recommend to add more nodes to the cluster or upgrade from t2.small to something bigger?
w
Yeah - that is surprising. That should give you 6 ENIs for Tasks, and not clear how you would be using them all if the above is all you have running.
would you recommend to add more nodes to the cluster or upgrade from t2.small to something bigger?
When you run into this particular limitation - there's not a simple answer necessarily - though more smaller instances generally has more bang for the buck on this metric (ENI allocation doesn't scale at all linearly with size/cost).
(Random aside - I worked on the EC2 team at AWS for awhile before Pulumi, and one of the first tiny features I delivered was to increase the number of ENIs per
t2.small
from 2 => 3 :-))
a
well thanks for that and your time to answer here 🙂
I'll try and add more nodes to the cluster and see what happens from there