I have noticed a strange problem when running a refresh on a Pulumi Community #aws

I have noticed a strange problem when running a re...

careful-balloon-42392

10/03/2025, 8:52 PM

I have noticed a strange problem when running a refresh on a stack that has an s3 bucket, while also having intermittent DNS failures during the refresh. let's say that i have a bucket called

my-s3-bucket

that is created by pulumi and correctly managed in the stack state. this should map to a bucket at

<https://my-s3-bucket.s3.us-west-2.amazonaws.com>

. I ran refresh, and for whatever reason I had a random dns resolution failure during the refresh. this resulted in a "no such host" error, but the refresh still appears to have updated my state file. re-running pulumi up then gives me an error that it can't create the bucket because it already exists. is it deciding that the bucket doesn't exist because it can't resolve dns? note that i am using the (Python) automation api to do both refresh and up. Is there any way I can properly fail the refresh when dns doesn't resolve for that bucket? I'll add the error message in a 🧵

careful-balloon-42392

10/03/2025, 8:52 PM

Copy code

Diagnostics:
  aws:s3:BucketServerSideEncryptionConfiguration (my-s3-bucket-encryption-3):
    error:   sdk-v2/provider2.go:572: sdk.helper_schema: reading S3 Bucket Server-side Encryption Configuration (my-s3-bucket): operation error S3: GetBucketEncryption, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "<https://my-s3-bucket.s3.us-west-2.amazonaws.com/?encryption=>": dial tcp: lookup <http://my-s3-bucket.s3.us-west-2.amazonaws.com|my-s3-bucket.s3.us-west-2.amazonaws.com>: no such host: provider=aws@7.7.0
    error: refreshing urn:pulumi:mystack::myworkspace::foo:stacks:MyStack$foo:storage:S3BucketSet$aws:s3/bucketServerSideEncryptionConfiguration:BucketServerSideEncryptionConfiguration::my-s3-bucket-encryption-3: 1 error occurred:
        * reading S3 Bucket Server-side Encryption Configuration (my-s3-bucket): operation error S3: GetBucketEncryption, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "<http://my-s3-bucket.s3.us-west-2.amazonaws.com/?encryption=|my-s3-bucket.s3.us-west-2.amazonaws.com/?encryption=>": dial tcp: lookup <http://my-s3-bucket.s3.us-west-2.amazonaws.com|my-s3-bucket.s3.us-west-2.amazonaws.com>: no such host

careful-balloon-42392

10/03/2025, 8:53 PM

i can 100% reproduce this by blacking the bucket's dns with my firewall

careful-balloon-42392

10/03/2025, 8:57 PM

More concise summary: • bucket named my-s3-bucket, managed by Pulumi • block dns resolution of

<https://my-s3-bucket.s3.us-west-2.amazonaws.com>

at my firewall • run refresh - see an error about "no such host" • re-enable dns • run pulumi up • pulumi tries to create the bucket again

careful-balloon-42392

10/03/2025, 9:00 PM

this even happens if I set expect_no_changes to true when calling refresh, which is unexpected

careful-balloon-42392

10/03/2025, 9:02 PM

one thought i had is that I could run preview_refresh first and fail on any error, but it seems like there is still a race there since i can only persist the refresh by re-running refresh()

modern-zebra-45309

10/04/2025, 9:55 AM

I don't have a solution here, but I've encountered this as well with other providers (Kubernetes and Minio): When a resource cannot be reached during a refresh or update, it will be considered lost/deleted. I would also be interested in learning more about this and how to work around it. In the specific case, it was usually not a connection issue but a problem with missing credentials on a user's machine, and we were able to solve this by adding a quick connectivity/permission check before launching the Pulumi operation. This was good enough for us but obviously not truly a solution.

little-cartoon-10569

10/05/2025, 7:46 PM

Another workaround might be to use the

--refresh

flag with

pulumi up

, so that it re-finds the lost resources?

careful-balloon-42392

10/06/2025, 5:42 AM

oh interesting idea for recovery, i'll give that a shot

modern-zebra-45309

10/06/2025, 9:43 AM

If that does not work because the resource is gone from the state entirely and cannot be refreshed, you can import it back into the state with

pulumi import

. Maybe a refresh with

--run-program

is an additional option, but I haven't tried it yet in this scenario.

careful-balloon-42392

10/06/2025, 6:20 PM

well as far as i can tell, refreshing during up is basically the same as just refreshing -- which is to say that it seems to only delete things it can't find rather than trying to find things that should be in the state but aren't

little-cartoon-10569

10/06/2025, 7:02 PM

Ah I missed that. You need to do a read-only refresh before a read-write one, to check the DNS.. this is not a problem I've ever had to deal with. But many of my colleagues assert that the problem is always DNS. So at least you can say a problem shared is a problem halved!

careful-balloon-42392

10/06/2025, 7:04 PM

yeah i think that's what we're going with at this point. there is still a race between the refresh dry run and the actual refresh, but it's a small one at least. imo it is a bug to depend on domain resolution to update state. it should just ask aws if the bucket exists and trust the answer

👍 1

9 Views

Open in Slack

Previous Next