I'm finding that repeated pulumi up and pulumi ref...
# aws
p
I'm finding that repeated pulumi up and pulumi refresh is constantly editing and destroying rules in my AWS security groups, and then restoring them on the next cycle of refresh and up. From looking at the first refresh it seems like it is finding say 5 rules in the security group which match what the code generates but they are out of order. e.g. in the code 1,2,3,4,5 and then in aws 1,2,4,3,5 and then pulumi thinks it needs to update rules 3 and 4 essentially morphing each into the other. I think this update fails somehow (aws sg can be fussy about having rules added which match rules which already exist) and the result is that one rule is morphed into the other and the other is broken. The another round of refresh and update will fix it. As far as I know the order in which the AWS api returns the security group rules is based on their name which is sg- and a random hex string, as I can't know this before the sg are created I can't order the statements in my code to match the order in which they will appear in the api. Especially when you consider deploying the same code to multiple stacks. Is this a common issue? Can I solve it easily? Do I need to look at using AWS-Native over the AWS-Classic?
Hmmm maybe that's not the complete story. After the first refresh pulumi wants to remove the rule in the sg added by the eks module (I think) which is
"Allow pods to communicate with the cluster API Server"
which when removed will break the cluster.
the next pulumi refresh removes the rule we deleted with that pulkumi up from the state
and then the next up recreates the rule
l
Why are you refreshing? And, why is one SG being managed by two pieces of code?
p
I am refreshing to ensure that the infra and state match up, and every time I do so they appear to not match up
As to the 2 pieces of code, I am creating a security group on line 65 of a typescript program and giving it some ingress and egress rules so I can access my cluster from the ssh jumphost and the CICD system, and then on line 158 I use that group as the value for
clusterSecurityGroup:
in the
new eks.cluster
call. Whereapon the EKS cluster module adds a rule to that sg which lets the nodes talk to the cluster.
So I don't really see how I can not do that, unless I just use the default security group which has full internet egress which is something I'd rather control, and it also doesn't have ingress for kubectl commands from the jumphost etc.
l
If you lock down your target account so that no one except Pulumi has update permissions, then you can skip the refresh step. Refreshing is good for recovering from accidental changes, but typically it is not needed. Refreshing exposes you to the risk of state change due to unimportant changes made by your cloud service. Things like unpredictable ordering (e.g. security group rules, WAF rules), default values and others, which do not affect the normal preview-up-repeat cycle, become potentially destructive and certainly inconvenient problems when you switch to a refresh-preview-up cycle. To get this working, you need to extend your dev work to include manually refreshing after each dev up then updating your code to match whatever the refresh changed. If refreshing pulls an unimportant default value from the cloud into your state, you need to update your code to match the state, or else Pulumi will try to delete that default value. And removing some default values can be a destructive action, causing a replace rather than an update.
Can you post a link to the definition of
clusterSecurityGroup
? I can't see it at https://www.pulumi.com/registry/packages/eks/api-docs/cluster/
The EKS module updating your SG is a bit of a problem. Typically I allow access between SGs only (no CIDRs, only source SGs), and this logic prevents that. Is there another way to create the same resources, but without it modifying your SG? Maybe you can use NodeGroupSecurityGroup? Does that allow you to maintain full control over all resources?
p
So the solution was to let the EKS module create the sg, and then to add rules to it so I could access 443 from my AdminVM with code like
Copy code
// add a rule allowing the AdminVM in to kubectl the cluster
const adminClusterRule = new aws.ec2.SecurityGroupRule("adminClusterRule", {
	type: "ingress",
	fromPort: 443,
	toPort: 443,
	protocol: "tcp",
	securityGroupId: clusterSecurityGroupId,
	sourceSecurityGroupId: adminsg.ids[0],
	description: `Allow AdminVM to communicate with the ${nam}-eksCluster API Server`,
});
If you try to create the sg first with a rule like that and then pass that to the
new eks.cluster
function as clusterSecurityGroup then you get into the state I describe.
I should probably work out a minimal example of this, and post a bug on the EKS module's github, but I'm off to Mt. Blanc tomorrow for 3 weeks, so I don't have the time.
l
I don't think it's a bug, unfortunately. It functions as designed. An alternative design that works better for IaC would be good: this design appears to have been created for console users.