https://pulumi.com logo
#general
Title
# general
b

busy-napkin-83700

12/04/2023, 9:30 PM
I had a block of SSOAdmin calls that were working last week. Today, I am unable to run the same code as aws is rate limiting me. See thread
Copy code
aws:ssoadmin:PermissionSet (GRASP):
    error: 1 error occurred:
    	* updating urn:pulumi:identity-center-mapping::Identity-Center::aws:ssoadmin/permissionSet:PermissionSet::GRASP: 1 error occurred:
    	* reading SSO Permission Set (arn,arn): operation error SSO Admin: DescribePermissionSet, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested
 
  aws:ssoadmin:PermissionSetInlinePolicy (Data-Developer-inline):
    error: 1 error occurred:
    	* updating urn:pulumi:identity-center-mapping::Identity-Center::aws:ssoadmin/permissionSetInlinePolicy:PermissionSetInlinePolicy::Data-Developer-inline: 1 error occurred:
    	* putting SSO Permission Set (arn) Inline Policy: operation error SSO Admin: PutInlinePolicyToPermissionSet, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested
 
  pulumi:pulumi:Stack (Identity-Center-identity-center-mapping):
    error: Error: invocation of aws:ssoadmin/getPermissionSet:getPermissionSet returned an error: invoking aws:ssoadmin/getPermissionSet:getPermissionSet: 1 error occurred:
    	* listing SSO Permission Sets: operation error SSO Admin: ListPermissionSets, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested
d

dry-potato-52542

12/04/2023, 10:05 PM
Ahha. I used to work for AWS SSO. So that particular error mean you have been throttled by either AWS SSO or underlaying IAM service. FYI AWS SSO is heavily dependent on IAM. Every single time you do operation with permissions set aka IAM policy AWS SSO making calls to IAM on your behalf. If overall number of calls to IAM breaching threshold they will start throttling your calls. They do not care if it’s direct call or on your behalf.
I would suggest increase retry limit of your provider
Throttling exception considered retryable
So underlaying sdk should retry
Tweak retry and retry timeout and you should be good to go
b

busy-napkin-83700

12/04/2023, 10:07 PM
where can you tweak the retry? We have a hard limit of 20 TPS
and AWS Support said that we were throwing 50, but i didnt set that anywhere
seems like the provider recently got updated to 6_13_0 and thats when this started
d

dry-potato-52542

12/04/2023, 10:09 PM
All sdk come with default retry of up to 10. Your AWS provider has properly called retry limit . Checkout their interface
That directly being used to configure underlying sdk
If you do not supply any property there
Then default property are being set
b

busy-napkin-83700

12/04/2023, 10:12 PM
image.png
image.png
d

dry-potato-52542

12/04/2023, 10:13 PM
Just fyi the most common case when multiple people operating in the same AWS account. Although it’s a different people they are treated as one entity
b

busy-napkin-83700

12/04/2023, 10:14 PM
this particular account only deals with AWS SSO, so no other operations could be used here, def not enough to hit 20 TPS
d

dry-potato-52542

12/04/2023, 10:14 PM
See this
Well that maybe be positive since one your call to AWS SSO potentially can be equivalent to 5 calls to IAM
Also pulumi doesn’t sequence those calls
It runs them in parallel
b

busy-napkin-83700

12/04/2023, 10:16 PM
that would make sense why it was seeing 50 TPS
d

dry-potato-52542

12/04/2023, 10:16 PM
Yeah it’s never one to one
b

busy-napkin-83700

12/04/2023, 10:18 PM
ok, was trying to put in some async calls to go sequentially to slow it down
d

dry-potato-52542

12/04/2023, 10:18 PM
SSO is higher interface on top of IAM. So they have to make multiple downstream calls to achieve whit one click whatever you have to do with a double IAM with a five clicks
b

busy-napkin-83700

12/04/2023, 10:18 PM
what is the difference between standard and adaptive retryMode?
d

dry-potato-52542

12/04/2023, 10:18 PM
Yep that’s the only option you have with SSO
b

busy-napkin-83700

12/04/2023, 10:19 PM
Copy code
const permissionSet = pulumi.output(aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            }));
How would you put that in an async call?
d

dry-keyboard-94795

12/04/2023, 10:29 PM
Pulumi runs all resources in parallel, and there could be multiple requests for each resource. Pulumi should also be handling this rate limit error, so a github issue needs to be raised: https://github.com/pulumi/pulumi-aws/issues
@busy-napkin-83700 see here for their differences: https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html It'd be worth trying
adaptive
b

busy-napkin-83700

12/04/2023, 10:38 PM
thanks, will try out adaptive mode, and see if i can downgrade to 6.12 if that still doesnt work
adaptive didnt help
d

dry-keyboard-94795

12/04/2023, 10:53 PM
There's a couple more things you can try. Try
pulumi up --parallel 4
, which may reduce the amount of concurrent calls to aws. Pulumi runs all resources in parallel by default. You could also change from the
getPermissionSet
function and instead use
.get
methods. This will track the data in state instead of querying aws each time. https://www.pulumi.com/docs/concepts/resources/get/
b

busy-napkin-83700

12/04/2023, 10:57 PM
will try that too, what is the default parallel switch that is used?
Copy code
(default 2147483647)
lol
def something with the AWS version and 6.13. running up locally with 6.8 and no errors
but i do see rate limiting in the AWS UI
d

dry-keyboard-94795

12/04/2023, 11:01 PM
Interesting that you say 6.8. Can you try 6.9? There was a similar issue in the terraform provider 5.25, which pulumi 6.9 is based on
b

busy-napkin-83700

12/04/2023, 11:02 PM
with github actions, can i add the parallel to the command?
Copy code
- uses: pulumi/actions@v3
        with:
          color: auto
          command: up --parallel 4
i found the docs
d

dry-potato-52542

12/04/2023, 11:48 PM
Sorry was offline. for a bit. Yeah retry strategy is works in 90 persent of the time. But have to understand that initial surge off calls may go out together
I would recommend build dependency betwenn those calls and resolve them when you need it
it will be safest way with any performance downgrade
let's say you have code
Copy code
const permissionSet = pulumi.output(
  aws.ssoadmin.getPermissionSet({
    instanceArn: config.require("instanceArn"),
    name: "PermissionSet",
  })
);

let otherResource("name", {
    arg: permissionSet.arn
})
otherResource
in this depends on you ps.
highly likely you have list of permissions set and some for loop to create those resources
you can do folowing
Copy code
let psNames = ["PermissionSet", "PermissionSet2", "PermissionSet3"]
let dep = null;
let psList = [];
psNames.forEach(ps => {
    let dependency = [];
    if (dep) {
        depends.push(dep);
    }
    let pulumiPs = aws.ssoadmin.getPermissionSet({instanceArn: config.require("instanceArn"), name: ps}, {dependsOn: dependency})
    psList.push(pulumiPs);
    dep = pulumiPs;
})

let otherResource("name", {
    arg: psList[0].arn
})
Here ^ you are creating logical dependency between ps set fetches
so pulumi need to execute one at the time. first will not have dependency second will be dependent on first one as so on.
Also
Copy code
const permissionSet = pulumi.output(aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            }));
seems liek you resolving this is right away
I am not sure what is the use case you have but usually it done when you passing this property to non pulumi structs.
What i am trying to say using pulumi output is like
await call()
Better to use
call.then(data => {})
Same with pulumi
Copy code
const permissionSet = aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            });
then using
permissionSet.arn
will make pulumi resolve then values rather then you doing it manually
does that make sense?
d

dry-keyboard-94795

12/05/2023, 8:29 AM
@dry-potato-52542 Invokes behave differently to resources. There's no dependency chaining support. They're also promises, so can't really be used directly unless you're already in an async context using
await
.
I did some digging, and it sounds like the rate limit error is introduced as hashicorp switches resources to using the aws v2 sdk. https://github.com/hashicorp/terraform-provider-aws/issues/34669 In theory, this rate limit regression for ssoadmin is in tf provider v5.29, which means the pulumi aws provider v6.12 will work fine, with the regression in v6.13
d

dry-potato-52542

12/05/2023, 12:01 PM
Wait are you saying that
dependsOn
does not working? https://www.pulumi.com/docs/concepts/options/dependson/
Also I may miss something but what Control Tower has to do with AWS SSO?
d

dry-keyboard-94795

12/05/2023, 12:03 PM
dependsOn
isn't supported for functions, only resources
Also I may miss something but Control Tower has to do with AWS SSO?
It's a recently migrated aws api that's showing the same symptoms, a regression from changing the underlying sdk for that resource, which ssoadmin has also just undergone in TF.
d

dry-potato-52542

12/05/2023, 12:08 PM
But SDK is just interface on top of API itself. It doesn't have any inteligence.
It doesn't preform any rate limiting as such.
The error is generate by api itself. I don't think there is relation brtwenn those
d

dry-keyboard-94795

12/05/2023, 12:12 PM
It's a major change in the code flow, which means the previous rate limiting code that was in place client side may not be effective anymore. The issue needs raising in github for Pulumi, and likely terraform as well.
d

dry-potato-52542

12/05/2023, 12:17 PM
Hm... but again rate limiting cannot be perform on the client side. It's purely AWS SSO Control Plane responsibility. Here is how JS sdk interface looks like - https://github.com/aws/aws-sdk-js/blob/master/clients/ssoadmin.d.ts
Thrilling errors where alway there
I can cough them even with Cloud formation. You can get those errors even with sdk using api directly
SDK is just set of interfaces to help you communicate with APIs nothing more
d

dry-keyboard-94795

12/05/2023, 12:30 PM
And it's the client that should handle the exception + retry requests when available. The error from the api isn't the problem here, it's the client side that's not handling it correctly that's the regression.
d

dry-potato-52542

12/05/2023, 12:35 PM
You mean code where sdk client used? Like tf handlers?
d

dry-keyboard-94795

12/05/2023, 12:36 PM
Yep. Or the sdk itself if "adaptive" mode is used
d

dry-potato-52542

12/05/2023, 12:37 PM
Yes that one part. But it has nothing to do with SDK changes.
Those retry was added on propose since AWS SSO knew that people are getting throttled after reaching certain capacity calls.
d

dry-keyboard-94795

12/05/2023, 12:39 PM
It does if tf migrated to a different sdk. But those are just implementation details for them to work out.
d

dry-potato-52542

12/05/2023, 12:40 PM
@dry-keyboard-94795 Can you bring some examples? How on you opinion AWS SDK inference throttling? I just don't understand your point
d

dry-keyboard-94795

12/05/2023, 12:41 PM
The adaptive part reminds me, @busy-napkin-83700 during the preview phase, changes to the provider aren't taken into consideration, so setting the mode to adaptive wouldn't have worked. Instead, you could use the environment variable
AWS_RETRY_MODE=adaptive
until
pulumi up
succeeds with the new parameter. Though I know you've worked around the issue by rolling back to an older version
Can you bring some examples? How on you opinion AWS SDK inference throttling? I just don't understand your point
Different SDKs have different implementations. What I'm saying is is that the integration of the new sdk could be incorrect. Maybe because the v1 sdk didn't implement retries, tf had its own implementation that worked, and the v2 sdk doesn't retry as effectively as tf's implementation. https://github.com/hashicorp/terraform-provider-aws/commit/3de914de01c8ae37665b6fa8933c1cbb0699b097 Regardless, the reason behind the error doesn't matter.
d

dry-potato-52542

12/05/2023, 4:03 PM
Ok then this is makes more sense right now. It's not about SDK change. They could keep same version of SDK It's about this line - https://github.com/hashicorp/terraform-provider-aws/blame/e8723a97dc0bca4802b2474e8ea36da46134aac8/internal/service/ssoadmin/service_package.go#L25
If they would would just switch SDK without adding those errors to retryble you would be fine
What's happening now: You have X number of retry on terraform side and Y build in is SDK That's why you end up generating XY call (around 50 tps)
It works like nested loop
I think you need to go to other direction and set sdk retry (since you can control to property) to 0 and let terraform retry on it's own
b

busy-napkin-83700

12/05/2023, 6:41 PM
Copy code
retryMode: adaptive
  maxRetries: 20
so set maxRetries to 0?
d

dry-potato-52542

12/05/2023, 7:10 PM
Yeah try that and let terraform handle retry
Ideally you would want sdk manage those retry but it is what it is.