adamant-terabyte-3965
11/09/2023, 5:33 PMpulumi up
and creates a new environment, namespaced accordingly, in that cluster. However, the pulumi up
sometimes fails due to not enough resources existing, since a node takes several minutes to spin up and resources timeout before they become live. I get this error:
kubernetes:core/v1:Service (wox-parcel-api-svc):
error: 2 errors occurred:
* resource pr-1312/wox-parcel-api-svc was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: 'wox-parcel-api-svc' timed out waiting to be Ready
* Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods
kubernetes:core/v1:Service (home-value-api-svc):
error: 2 errors occurred:
* resource pr-1312/home-value-api-svc was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: 'home-value-api-svc' timed out waiting to be Ready
* Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods
kubernetes:core/v1:ServiceAccount (pr-1312-homevalue):
error: 1 error occurred:
* resource pr-1312/pr-1312-homevalue was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred polling for 'pr-1312-homevalue'
kubernetes:core/v1:ServiceAccount (pr-1312-woxapi):
error: 1 error occurred:
* resource pr-1312/pr-1312-woxapi was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred polling for 'pr-1312-woxapi'
kubernetes:core/v1:Service (market-insights-api-svc):
error: 2 errors occurred:
* resource pr-1312/market-insights-api-svc was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: 'market-insights-api-svc' timed out waiting to be Ready
* Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods
I've tried increasing customTimeouts
on the Pulumi-provisioned resources to wait for the new node to become live, but this hasn't fixed the issue. Is there some way to get Pulumi to retry, or wait for additional compute?cuddly-computer-18851
11/09/2023, 11:19 PMasync function waitForInstanceRefresh(name: string): Promise<boolean> {
if (!pulumi.runtime.isDryRun()) {
const credentials = fromNodeProviderChain({ profile: awsProfile });
const config = { credentials, region: awsRegion };
const autoScalingClient = new AutoScalingClient(config);
const refreshCommand = new DescribeInstanceRefreshesCommand({
AutoScalingGroupName: name,
MaxRecords: 100,
});
await backOff(
async () => {
const { InstanceRefreshes } = await autoScalingClient.send(refreshCommand);
const inProgress = InstanceRefreshes?.filter((e) => e.Status === 'InProgress');
if (inProgress && inProgress.length === 0) {
return true;
} else if (inProgress) {
throw Error(`ASG refresh still in progress`);
} else {
throw Error('ASG client failed?');
}
},
{
retry: async (e, attemptNumber) => {
await <http://pulumi.log.info|pulumi.log.info>(`checking ASG refresh for ${name}: ${attemptNumber}`);
return true;
},
numOfAttempts: 30,
startingDelay: 10000,
maxDelay: 60000,
delayFirstAttempt: true,
jitter: 'none',
},
);
}
return true;
}
then call this in an apply
on some property before the thing you want to deploy, like
asg.name.apply((name) => waitForInstanceRefresh(name));