Hello. We have been fairly frequently (on a % of deployment basis) hit by an issue whereby layer versions, source code uploaded to s3 after a manual deployment step, associated with an aws lambda function. I've been very casually documenting occurrences of this issue and have noticed a correlation between in-pulumi-deployment failures (such as due to an aws service validation message, auth issue, or other deployment error unrelated to the specific resources we have issues with) and this issue happening in the a subsequent successful deployment.
How do we perceive the issue? Lambda functions appear to be running out-of-date code from a previous version
How do we spot it? Typically runtime code is out-of-sync with database schema and we see runtime sql errors
Observation #1: The latest version of the source code IS uploaded as a new version of a s3.BucketObject
Observation #2: A new lambda.LayerVersion replacement IS created successfully
Observation #3: Our lambda is dependent upon the lambda.LayerVersion and a db migration task. If that migration task fails, we've updated the LayerVersion but the lambda.Function update will not happen because db is not ready and a dependency has been rejected.
Observation #4: When we fix the other dependent issue and get a successful deployment we see the issues described above. It would seem that, although logs indicate our lambda.Function was updated, it did not get given the correct (newly, validated to be uploaded) lambda.LayerVersion - perhaps because some bad state exists from the failed deployment
I have previously raised some issues
with how the pulumi implementation of asset hashing, it's state persistance, and it's interplay with aws s3 (most notably where terraform provider
cannot be overridden) and I have a feeling that we're seeing the issue again here. Has anyone seen this? Has anyone resolved this?