This message was deleted Pulumi Community #general

Join Slack

This message was deleted.

# general

sparse-intern-71089

12/04/2023, 9:30 PM

This message was deleted.

busy-napkin-83700

12/04/2023, 9:32 PM

Copy code

aws:ssoadmin:PermissionSet (GRASP):
    error: 1 error occurred:
    	* updating urn:pulumi:identity-center-mapping::Identity-Center::aws:ssoadmin/permissionSet:PermissionSet::GRASP: 1 error occurred:
    	* reading SSO Permission Set (arn,arn): operation error SSO Admin: DescribePermissionSet, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested
 
  aws:ssoadmin:PermissionSetInlinePolicy (Data-Developer-inline):
    error: 1 error occurred:
    	* updating urn:pulumi:identity-center-mapping::Identity-Center::aws:ssoadmin/permissionSetInlinePolicy:PermissionSetInlinePolicy::Data-Developer-inline: 1 error occurred:
    	* putting SSO Permission Set (arn) Inline Policy: operation error SSO Admin: PutInlinePolicyToPermissionSet, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested
 
  pulumi:pulumi:Stack (Identity-Center-identity-center-mapping):
    error: Error: invocation of aws:ssoadmin/getPermissionSet:getPermissionSet returned an error: invoking aws:ssoadmin/getPermissionSet:getPermissionSet: 1 error occurred:
    	* listing SSO Permission Sets: operation error SSO Admin: ListPermissionSets, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested

dry-potato-52542

12/04/2023, 10:05 PM

Ahha. I used to work for AWS SSO. So that particular error mean you have been throttled by either AWS SSO or underlaying IAM service. FYI AWS SSO is heavily dependent on IAM. Every single time you do operation with permissions set aka IAM policy AWS SSO making calls to IAM on your behalf. If overall number of calls to IAM breaching threshold they will start throttling your calls. They do not care if it’s direct call or on your behalf.

dry-potato-52542

12/04/2023, 10:05 PM

I would suggest increase retry limit of your provider

dry-potato-52542

12/04/2023, 10:06 PM

Throttling exception considered retryable

dry-potato-52542

12/04/2023, 10:06 PM

So underlaying sdk should retry

dry-potato-52542

12/04/2023, 10:06 PM

Tweak retry and retry timeout and you should be good to go

busy-napkin-83700

12/04/2023, 10:07 PM

where can you tweak the retry? We have a hard limit of 20 TPS

busy-napkin-83700

12/04/2023, 10:07 PM

and AWS Support said that we were throwing 50, but i didnt set that anywhere

busy-napkin-83700

12/04/2023, 10:08 PM

seems like the provider recently got updated to 6_13_0 and thats when this started

dry-potato-52542

12/04/2023, 10:09 PM

All sdk come with default retry of up to 10. Your AWS provider has properly called retry limit . Checkout their interface

dry-potato-52542

12/04/2023, 10:09 PM

That directly being used to configure underlying sdk

dry-potato-52542

12/04/2023, 10:12 PM

If you do not supply any property there

dry-potato-52542

12/04/2023, 10:12 PM

Then default property are being set

dry-potato-52542

12/04/2023, 10:13 PM

Just fyi the most common case when multiple people operating in the same AWS account. Although it’s a different people they are treated as one entity

busy-napkin-83700

12/04/2023, 10:14 PM

this particular account only deals with AWS SSO, so no other operations could be used here, def not enough to hit 20 TPS

dry-potato-52542

12/04/2023, 10:14 PM

See this

dry-potato-52542

12/04/2023, 10:14 PM

https://www.pulumi.com/registry/packages/aws/api-docs/provider/#maxretries_nodejs

dry-potato-52542

12/04/2023, 10:15 PM

https://www.pulumi.com/registry/packages/aws/api-docs/provider/#retrymode_nodejs

dry-potato-52542

12/04/2023, 10:16 PM

Well that maybe be positive since one your call to AWS SSO potentially can be equivalent to 5 calls to IAM

dry-potato-52542

12/04/2023, 10:16 PM

Also pulumi doesn’t sequence those calls

dry-potato-52542

12/04/2023, 10:16 PM

It runs them in parallel

busy-napkin-83700

12/04/2023, 10:16 PM

that would make sense why it was seeing 50 TPS

dry-potato-52542

12/04/2023, 10:16 PM

Yeah it’s never one to one

busy-napkin-83700

12/04/2023, 10:18 PM

ok, was trying to put in some async calls to go sequentially to slow it down

dry-potato-52542

12/04/2023, 10:18 PM

SSO is higher interface on top of IAM. So they have to make multiple downstream calls to achieve whit one click whatever you have to do with a double IAM with a five clicks

busy-napkin-83700

12/04/2023, 10:18 PM

what is the difference between standard and adaptive retryMode?

dry-potato-52542

12/04/2023, 10:18 PM

Yep that’s the only option you have with SSO

busy-napkin-83700

12/04/2023, 10:19 PM

Copy code

const permissionSet = pulumi.output(aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            }));

How would you put that in an async call?

dry-keyboard-94795

12/04/2023, 10:29 PM

Pulumi runs all resources in parallel, and there could be multiple requests for each resource. Pulumi should also be handling this rate limit error, so a github issue needs to be raised: https://github.com/pulumi/pulumi-aws/issues

dry-keyboard-94795

12/04/2023, 10:33 PM

@busy-napkin-83700 see here for their differences: https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html It'd be worth trying

adaptive

busy-napkin-83700

12/04/2023, 10:38 PM

thanks, will try out adaptive mode, and see if i can downgrade to 6.12 if that still doesnt work

busy-napkin-83700

12/04/2023, 10:46 PM

adaptive didnt help

dry-keyboard-94795

12/04/2023, 10:53 PM

There's a couple more things you can try. Try

pulumi up --parallel 4

, which may reduce the amount of concurrent calls to aws. Pulumi runs all resources in parallel by default. You could also change from the

getPermissionSet

function and instead use

.get

methods. This will track the data in state instead of querying aws each time. https://www.pulumi.com/docs/concepts/resources/get/

busy-napkin-83700

12/04/2023, 10:57 PM

will try that too, what is the default parallel switch that is used?

busy-napkin-83700

12/04/2023, 10:58 PM

Copy code

(default 2147483647)

lol

😂 1

busy-napkin-83700

12/04/2023, 11:00 PM

def something with the AWS version and 6.13. running up locally with 6.8 and no errors

busy-napkin-83700

12/04/2023, 11:01 PM

but i do see rate limiting in the AWS UI

dry-keyboard-94795

12/04/2023, 11:01 PM

Interesting that you say 6.8. Can you try 6.9? There was a similar issue in the terraform provider 5.25, which pulumi 6.9 is based on

busy-napkin-83700

12/04/2023, 11:02 PM

with github actions, can i add the parallel to the command?

Copy code

- uses: pulumi/actions@v3
        with:
          color: auto
          command: up --parallel 4

busy-napkin-83700

12/04/2023, 11:07 PM

i found the docs

dry-potato-52542

12/04/2023, 11:48 PM

Sorry was offline. for a bit. Yeah retry strategy is works in 90 persent of the time. But have to understand that initial surge off calls may go out together

dry-potato-52542

12/04/2023, 11:48 PM

I would recommend build dependency betwenn those calls and resolve them when you need it

dry-potato-52542

12/04/2023, 11:49 PM

it will be safest way with any performance downgrade

dry-potato-52542

12/04/2023, 11:51 PM

let's say you have code

Copy code

const permissionSet = pulumi.output(
  aws.ssoadmin.getPermissionSet({
    instanceArn: config.require("instanceArn"),
    name: "PermissionSet",
  })
);

let otherResource("name", {
    arg: permissionSet.arn
})

otherResource

in this depends on you ps.

dry-potato-52542

12/04/2023, 11:51 PM

highly likely you have list of permissions set and some for loop to create those resources

dry-potato-52542

12/04/2023, 11:52 PM

you can do folowing

dry-potato-52542

12/04/2023, 11:57 PM

Copy code

let psNames = ["PermissionSet", "PermissionSet2", "PermissionSet3"]
let dep = null;
let psList = [];
psNames.forEach(ps => {
    let dependency = [];
    if (dep) {
        depends.push(dep);
    }
    let pulumiPs = aws.ssoadmin.getPermissionSet({instanceArn: config.require("instanceArn"), name: ps}, {dependsOn: dependency})
    psList.push(pulumiPs);
    dep = pulumiPs;
})

let otherResource("name", {
    arg: psList[0].arn
})

dry-potato-52542

12/04/2023, 11:57 PM

Here ^ you are creating logical dependency between ps set fetches

dry-potato-52542

12/04/2023, 11:58 PM

so pulumi need to execute one at the time. first will not have dependency second will be dependent on first one as so on.

dry-potato-52542

12/04/2023, 11:59 PM

Also

Copy code

const permissionSet = pulumi.output(aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            }));

seems liek you resolving this is right away

dry-potato-52542

12/05/2023, 12:00 AM

I am not sure what is the use case you have but usually it done when you passing this property to non pulumi structs.

dry-potato-52542

12/05/2023, 12:02 AM

What i am trying to say using pulumi output is like

await call()

Better to use

call.then(data => {})

dry-potato-52542

12/05/2023, 12:03 AM

Same with pulumi

Copy code

const permissionSet = aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            });

dry-potato-52542

12/05/2023, 12:03 AM

then using

permissionSet.arn

will make pulumi resolve then values rather then you doing it manually

dry-potato-52542

12/05/2023, 12:03 AM

does that make sense?

dry-keyboard-94795

12/05/2023, 8:29 AM

@dry-potato-52542 Invokes behave differently to resources. There's no dependency chaining support. They're also promises, so can't really be used directly unless you're already in an async context using

await

dry-keyboard-94795

12/05/2023, 8:32 AM

I did some digging, and it sounds like the rate limit error is introduced as hashicorp switches resources to using the aws v2 sdk. https://github.com/hashicorp/terraform-provider-aws/issues/34669 In theory, this rate limit regression for ssoadmin is in tf provider v5.29, which means the pulumi aws provider v6.12 will work fine, with the regression in v6.13

dry-potato-52542

12/05/2023, 12:01 PM

Wait are you saying that

dependsOn

does not working? https://www.pulumi.com/docs/concepts/options/dependson/

dry-potato-52542

12/05/2023, 12:02 PM

Also I may miss something but what Control Tower has to do with AWS SSO?

dry-keyboard-94795

12/05/2023, 12:03 PM

dependsOn

isn't supported for functions, only resources

dry-keyboard-94795

12/05/2023, 12:06 PM

Also I may miss something but Control Tower has to do with AWS SSO?

It's a recently migrated aws api that's showing the same symptoms, a regression from changing the underlying sdk for that resource, which ssoadmin has also just undergone in TF.

dry-potato-52542

12/05/2023, 12:08 PM

But SDK is just interface on top of API itself. It doesn't have any inteligence.

dry-potato-52542

12/05/2023, 12:09 PM

It doesn't preform any rate limiting as such.

dry-potato-52542

12/05/2023, 12:09 PM

The error is generate by api itself. I don't think there is relation brtwenn those

dry-keyboard-94795

12/05/2023, 12:12 PM

It's a major change in the code flow, which means the previous rate limiting code that was in place client side may not be effective anymore. The issue needs raising in github for Pulumi, and likely terraform as well.

dry-potato-52542

12/05/2023, 12:17 PM

Hm... but again rate limiting cannot be perform on the client side. It's purely AWS SSO Control Plane responsibility. Here is how JS sdk interface looks like - https://github.com/aws/aws-sdk-js/blob/master/clients/ssoadmin.d.ts

dry-potato-52542

12/05/2023, 12:18 PM

Thrilling errors where alway there

dry-potato-52542

12/05/2023, 12:19 PM

I can cough them even with Cloud formation. You can get those errors even with sdk using api directly

dry-potato-52542

12/05/2023, 12:19 PM

SDK is just set of interfaces to help you communicate with APIs nothing more

dry-keyboard-94795

12/05/2023, 12:30 PM

And it's the client that should handle the exception + retry requests when available. The error from the api isn't the problem here, it's the client side that's not handling it correctly that's the regression.

dry-potato-52542

12/05/2023, 12:35 PM

You mean code where sdk client used? Like tf handlers?

dry-keyboard-94795

12/05/2023, 12:36 PM

Yep. Or the sdk itself if "adaptive" mode is used

dry-potato-52542

12/05/2023, 12:37 PM

Yes that one part. But it has nothing to do with SDK changes.

dry-potato-52542

12/05/2023, 12:38 PM

This is how CFN handler are looks like https://github.com/aws-cloudformation/aws-cloudformation-resource-providers-sso/bl[…]rc/main/java/software/amazon/sso/permissionset/ReadHandler.java

dry-potato-52542

12/05/2023, 12:39 PM

Those retry was added on propose since AWS SSO knew that people are getting throttled after reaching certain capacity calls.

dry-keyboard-94795

12/05/2023, 12:39 PM

It does if tf migrated to a different sdk. But those are just implementation details for them to work out.

dry-potato-52542

12/05/2023, 12:40 PM

@dry-keyboard-94795 Can you bring some examples? How on you opinion AWS SDK inference throttling? I just don't understand your point

dry-keyboard-94795

12/05/2023, 12:41 PM

The adaptive part reminds me, @busy-napkin-83700 during the preview phase, changes to the provider aren't taken into consideration, so setting the mode to adaptive wouldn't have worked. Instead, you could use the environment variable

AWS_RETRY_MODE=adaptive

until

pulumi up

succeeds with the new parameter. Though I know you've worked around the issue by rolling back to an older version

dry-keyboard-94795

12/05/2023, 1:03 PM

Can you bring some examples? How on you opinion AWS SDK inference throttling? I just don't understand your point

Different SDKs have different implementations. What I'm saying is is that the integration of the new sdk could be incorrect. Maybe because the v1 sdk didn't implement retries, tf had its own implementation that worked, and the v2 sdk doesn't retry as effectively as tf's implementation. https://github.com/hashicorp/terraform-provider-aws/commit/3de914de01c8ae37665b6fa8933c1cbb0699b097 Regardless, the reason behind the error doesn't matter.

dry-potato-52542

12/05/2023, 4:03 PM

Ok then this is makes more sense right now. It's not about SDK change. They could keep same version of SDK It's about this line - https://github.com/hashicorp/terraform-provider-aws/blame/e8723a97dc0bca4802b2474e8ea36da46134aac8/internal/service/ssoadmin/service_package.go#L25

dry-potato-52542

12/05/2023, 4:04 PM

If they would would just switch SDK without adding those errors to retryble you would be fine

dry-potato-52542

12/05/2023, 4:05 PM

What's happening now: You have X number of retry on terraform side and Y build in is SDK That's why you end up generating XY call (around 50 tps)

dry-potato-52542

12/05/2023, 4:05 PM

It works like nested loop

dry-potato-52542

12/05/2023, 4:06 PM

I think you need to go to other direction and set sdk retry (since you can control to property) to 0 and let terraform retry on it's own

busy-napkin-83700

12/05/2023, 6:41 PM

Copy code

retryMode: adaptive
  maxRetries: 20

busy-napkin-83700

12/05/2023, 6:41 PM

so set maxRetries to 0?

dry-potato-52542

12/05/2023, 7:10 PM

Yeah try that and let terraform handle retry

dry-potato-52542

12/05/2023, 7:10 PM

Ideally you would want sdk manage those retry but it is what it is.

Open in Slack

Previous Next