Hey All, I'm running into some issues running Pulu...
# automation-api
f
Hey All, I'm running into some issues running Pulumi inside a containerized app (ECS Fargate). The
refresh
call works fine, but I'm getting one of 2 errors each time an inline application 'up' is attempted: • `awsecsCluster error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.132825 connect: connection refused`` I've received this on resources other than ECS within the stack as well. • Or a generic:
Command was killed with SIGKILL (Forced termination): pulumi up --yes --skip-preview --client=127.0.0.1:36833 --exec-kind auto.inline --stack dev --non-interactive\n
Initially, I thought it had to do with the Pulumi not being able to write to the workDir. I'm mounting a
/tmp
volume to the container and passing the workDir accordingly into the
createOrSelectStack
call. Any ideas what either of these could mean? Searching slack for the
connection refused
doesn't turn up any specific solutions - seems to be different for each case.
l
Refresh doesn't actually run your pulumi program, so that might explain some fo the difference in behavior you're seeing. Pulumi programs use a couple of local gRPC servers to orchestrate different processes. I would not expect this to be a problem, but perhaps there is some sort of network policy blocking these services from networking together?
The only other thing I would ask is whether or not your fargate container has enough memory. Might consider making sure you have at least 8GB, and bumping to 16 just to eliminate that as a factor.
f
Great thoughts. Thanks for taking a look. I've opened all network traffic (temporarily) for troubleshooting which didn't make a difference, so not feeling like that's the issue. It looks like it actually is throwing on the gRPC call as you suggested after digging deeper.
Copy code
/home/node/app/node_modules/@grpc/grpc-js/src/call.ts:81
  return Object.assign(new Error(message), status);
                       ^
Error: Resource monitor is terminating
    at Object.callErrorFromStatus (/home/node/app/node_modules/@grpc/grpc-js/src/call.ts:81:24)
    at Object.onReceiveStatus (/home/node/app/node_modules/@grpc/grpc-js/src/client.ts:338:36)
    at Object.onReceiveStatus (/home/node/app/node_modules/@grpc/grpc-js/src/client-interceptors.ts:426:34)
    at Object.onReceiveStatus (/home/node/app/node_modules/@grpc/grpc-js/src/client-interceptors.ts:389:48)
    at /home/node/app/node_modules/@grpc/grpc-js/src/call-stream.ts:276:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)
I'll bump the specs to see if that makes a difference. Not sure where this would leave me if traffic is open and specs don't work though.
The error stack isn't all that helpful
Looks like bumping the specs got me moving! 👍
Onto another error, albeit one that makes sense to me, and I can troubleshoot on my own. Thanks for the help @lemon-agent-27707
l
Copy code
Resource monitor is terminating
This typically indicates resource exhaustion in my experience. Glad to hear you're unblocked partypus 8bit
o
@few-carpenter-12885 I'd like to ask some follow up questions, if you don't mind. I'm keenly interested in improving our memory consumption to enable higher-scale use of automation API. Could you let me know here or via DM how much memory did the Fargate instance have available initially, and what providers were being used (AWS, GCP, etc)?
f
Initially, Fargate was provisioned with .5 CPU and 2GB RAM. I bumped it to 1 CPU and 8GB, and the issue was resolved. Not sure where the sweet spot between there might be.