I think I have found a bug. In prototyping I am t...
# general
i
I think I have found a bug. In prototyping I am tearing down app and infrastructure (cluster, db) stacks, but leaving up the identity stack (gcp). It seems roles are disappearing from BOTH pulumi related gcp service accounts, and NON-pulumi related service accounts. We have not seen this issue prior to pulumi and so I am correlating it with my activity in the same gcp project.
the non-pulumi service account
build-sa
loses the storage admin role
c
cc @white-balloon-205 @stocky-spoon-28903 I have seen this too.
I don’t have a repro, but it is a thing.
s
Hmm, this sounds serious.
i
the pulumi based service account
api-<stack name>
has also lost the
roles/storage.admin
re-running
pulumi up
on the identity stack does not re-add the iam role to the service account
the service account does continue to be confirmed as existing
What is strange is that one sa is pulumi, one is not
both service accounts are
roles/storage.admin
only
c
I’m guessing this is a bug in the TF GCP provider
It’s hard to diagnose because it’s not clear that something happened until a bit later.
i
it is not finding a diff though, otherwise
pulumi up
would add it back
s
That would be my guess also. @important-leather-28796 Is this a program you are able to share (by DM is fine if necessary)?
That might get us to a quicker repro
i
s
Yup, this is almost certainly a bug in the GCP Terraform provider that
pulumi-gcp
is based on.
i
here is the component and execution of the identity https://gist.github.com/rosskevin/00f05766829a9b45888c508949399f0a
I’ll add a result to that, hang on
I updated it with some sample output
I am not sure which
destroy
removes the role, my app or cluster, I’ll see if I can narrow that
App destroy left them intact. I’m destroying the infrastructure stack now
tearing down infrastructure stack did not change the roles this time.
s
Hmm, my first experiment did not spot this
@creamy-potato-29402 do you have any leads on this?
i
fyi - I’ll raise the stack again in the morning, and check again - perhaps it is
up
and not
destroy
that is changing things.
s
How were the role assignments being incorrectly destroyed made originally?
i
one was pulumi, one was:
Copy code
gcloud iam service-accounts create ${BUILD_SA} \
    --project=${GOOGLE_CLOUD_PROJECT} \
    --display-name ${BUILD_SA}

gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \
    --role roles/storage.admin \
    --member serviceAccount:${BUILD_SA_EMAIL}
but note that one - the role was re-added with the console (several times this week)
the sa names are different. outside one is
build-sa
, the pulumi one is
api-development
This bit us again today in production, this time on a yet different service account (though the same target role). This sa was not pulumi managed but I am operating on a different cluster with different resources in the same gcp project while prototyping with pulumi.
s
This is top of my list to investigate on Monday
i
Sorry I couldn’t narrow it down. I checked and rechecked after destroy and up - and assumed I couldn’t recreate, then got a notification that production was down.
s
i
so on that note, I have only been adding new app level deployment/ingress/services
I’ll add some notes
s
Great, thanks. Anything that might narrow it down is useful!