I have noticed a strange problem when running a re...
# aws
c
I have noticed a strange problem when running a refresh on a stack that has an s3 bucket, while also having intermittent DNS failures during the refresh. let's say that i have a bucket called
my-s3-bucket
that is created by pulumi and correctly managed in the stack state. this should map to a bucket at
<https://my-s3-bucket.s3.us-west-2.amazonaws.com>
. I ran refresh, and for whatever reason I had a random dns resolution failure during the refresh. this resulted in a "no such host" error, but the refresh still appears to have updated my state file. re-running pulumi up then gives me an error that it can't create the bucket because it already exists. is it deciding that the bucket doesn't exist because it can't resolve dns? note that i am using the (Python) automation api to do both refresh and up. Is there any way I can properly fail the refresh when dns doesn't resolve for that bucket? I'll add the error message in a 🧵
Copy code
Diagnostics:
  aws:s3:BucketServerSideEncryptionConfiguration (my-s3-bucket-encryption-3):
    error:   sdk-v2/provider2.go:572: sdk.helper_schema: reading S3 Bucket Server-side Encryption Configuration (my-s3-bucket): operation error S3: GetBucketEncryption, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "<https://my-s3-bucket.s3.us-west-2.amazonaws.com/?encryption=>": dial tcp: lookup <http://my-s3-bucket.s3.us-west-2.amazonaws.com|my-s3-bucket.s3.us-west-2.amazonaws.com>: no such host: provider=aws@7.7.0
    error: refreshing urn:pulumi:mystack::myworkspace::foo:stacks:MyStack$foo:storage:S3BucketSet$aws:s3/bucketServerSideEncryptionConfiguration:BucketServerSideEncryptionConfiguration::my-s3-bucket-encryption-3: 1 error occurred:
        * reading S3 Bucket Server-side Encryption Configuration (my-s3-bucket): operation error S3: GetBucketEncryption, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "<http://my-s3-bucket.s3.us-west-2.amazonaws.com/?encryption=|my-s3-bucket.s3.us-west-2.amazonaws.com/?encryption=>": dial tcp: lookup <http://my-s3-bucket.s3.us-west-2.amazonaws.com|my-s3-bucket.s3.us-west-2.amazonaws.com>: no such host
i can 100% reproduce this by blacking the bucket's dns with my firewall
More concise summary: • bucket named my-s3-bucket, managed by Pulumi • block dns resolution of
<https://my-s3-bucket.s3.us-west-2.amazonaws.com>
at my firewall • run refresh - see an error about "no such host" • re-enable dns • run pulumi up • pulumi tries to create the bucket again
this even happens if I set expect_no_changes to true when calling refresh, which is unexpected
one thought i had is that I could run preview_refresh first and fail on any error, but it seems like there is still a race there since i can only persist the refresh by re-running refresh()
m
I don't have a solution here, but I've encountered this as well with other providers (Kubernetes and Minio): When a resource cannot be reached during a refresh or update, it will be considered lost/deleted. I would also be interested in learning more about this and how to work around it. In the specific case, it was usually not a connection issue but a problem with missing credentials on a user's machine, and we were able to solve this by adding a quick connectivity/permission check before launching the Pulumi operation. This was good enough for us but obviously not truly a solution.
l
Another workaround might be to use the
--refresh
flag with
pulumi up
, so that it re-finds the lost resources?
c
oh interesting idea for recovery, i'll give that a shot
m
If that does not work because the resource is gone from the state entirely and cannot be refreshed, you can import it back into the state with
pulumi import
. Maybe a refresh with
--run-program
is an additional option, but I haven't tried it yet in this scenario.
c
well as far as i can tell, refreshing during up is basically the same as just refreshing -- which is to say that it seems to only delete things it can't find rather than trying to find things that should be in the state but aren't
l
Ah I missed that. You need to do a read-only refresh before a read-write one, to check the DNS.. this is not a problem I've ever had to deal with. But many of my colleagues assert that the problem is always DNS. So at least you can say a problem shared is a problem halved!
c
yeah i think that's what we're going with at this point. there is still a race between the refresh dry run and the actual refresh, but it's a small one at least. imo it is a bug to depend on domain resolution to update state. it should just ask aws if the bucket exists and trust the answer
👍 1