Hello everyone, I often find it difficult to get the connectivity in my Pulumi deployments right. E.g., ensuring that two containerised services in AWS can talk to each other (and nobody else can). I usually end up in iteratively deploying, testing, trying to fix the (network) configuration. How do you go about this? Does anyone have some cool advice, e.g., a testing tool or some methodology that helps me identifying these connectivity issues earlier, reducing the number of needed iterations?
03/15/2022, 1:54 PM
Speaking as an IaC practitioner (rather than a Pulumi employee), this is a problem that's plagued me the entire time I've been doing cloud. To my knowledge, there is no easy solution. The stuff I've learned is:
1. It's nearly always security groups, so check them first.
2. If you author your infra so that you use SG IDs instead of inbound CIDR blocks for rules, and each service gets a unique SG, you do not need to worry about other services talking to each other in an unauthorized fashion. I think it would not be terribly tough to add some unit tests against the Pulumi code to verify that SGs have certain paths open/closed.
3. You almost never need NACLs - I haven't seen a necessary use case yet, personally.
I also vaguely remember AWS coming out (recently, like 2021) with a testing tool for "can this entity talk to this other entity across the network", but I cannot find it for the life of me. Maybe this will ring a bell for someone reading this thread.
I've tried stuff like the Chef testing framework whose name I forget, but that requires you to provision the tool on your infra and often adds network connections you don't actually want in prod infra, which was a nonstarter for me.
03/16/2022, 8:58 AM
Thanks for your reply @stocky-restaurant-98004! Do you mean the “AWS Reachability Analyzer”?: https://docs.aws.amazon.com/vpc/latest/reachability/what-is-reachability-analyzer.html
As far as I see this actually requires deploying the stack, so it might speed up the testing, but would not reduce the number of iterations as it is no “early detection” based on the deployment description. I assume it may be useful for regression testing though
03/16/2022, 1:47 PM
RE Reachability Analyzer: YES! Thank you!
For early detection, I don't think that you're going to find a satisfying answer because in order to e.g. answer a question about IAM access, you'd have to duplicate the entire IAM machinery, y'know? So AFAIK, the best you can do to get a read on how things will work pre-deployment are some unit tests, and I've always questioned the value of unit tests for most IaC. (This is sometimes a controversial opinion, and an exception would be packages/modules/components that are intended for wide distribution.)