I had a block of SSOAdmin calls that were working ...
# general
b
I had a block of SSOAdmin calls that were working last week. Today, I am unable to run the same code as aws is rate limiting me. See thread
Copy code
aws:ssoadmin:PermissionSet (GRASP):
    error: 1 error occurred:
    	* updating urn:pulumi:identity-center-mapping::Identity-Center::aws:ssoadmin/permissionSet:PermissionSet::GRASP: 1 error occurred:
    	* reading SSO Permission Set (arn,arn): operation error SSO Admin: DescribePermissionSet, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested
 
  aws:ssoadmin:PermissionSetInlinePolicy (Data-Developer-inline):
    error: 1 error occurred:
    	* updating urn:pulumi:identity-center-mapping::Identity-Center::aws:ssoadmin/permissionSetInlinePolicy:PermissionSetInlinePolicy::Data-Developer-inline: 1 error occurred:
    	* putting SSO Permission Set (arn) Inline Policy: operation error SSO Admin: PutInlinePolicyToPermissionSet, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested
 
  pulumi:pulumi:Stack (Identity-Center-identity-center-mapping):
    error: Error: invocation of aws:ssoadmin/getPermissionSet:getPermissionSet returned an error: invoking aws:ssoadmin/getPermissionSet:getPermissionSet: 1 error occurred:
    	* listing SSO Permission Sets: operation error SSO Admin: ListPermissionSets, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested
d
Ahha. I used to work for AWS SSO. So that particular error mean you have been throttled by either AWS SSO or underlaying IAM service. FYI AWS SSO is heavily dependent on IAM. Every single time you do operation with permissions set aka IAM policy AWS SSO making calls to IAM on your behalf. If overall number of calls to IAM breaching threshold they will start throttling your calls. They do not care if it’s direct call or on your behalf.
I would suggest increase retry limit of your provider
Throttling exception considered retryable
So underlaying sdk should retry
Tweak retry and retry timeout and you should be good to go
b
where can you tweak the retry? We have a hard limit of 20 TPS
and AWS Support said that we were throwing 50, but i didnt set that anywhere
seems like the provider recently got updated to 6_13_0 and thats when this started
d
All sdk come with default retry of up to 10. Your AWS provider has properly called retry limit . Checkout their interface
That directly being used to configure underlying sdk
If you do not supply any property there
Then default property are being set
b
image.png
image.png
d
Just fyi the most common case when multiple people operating in the same AWS account. Although it’s a different people they are treated as one entity
b
this particular account only deals with AWS SSO, so no other operations could be used here, def not enough to hit 20 TPS
d
See this
Well that maybe be positive since one your call to AWS SSO potentially can be equivalent to 5 calls to IAM
Also pulumi doesn’t sequence those calls
It runs them in parallel
b
that would make sense why it was seeing 50 TPS
d
Yeah it’s never one to one
b
ok, was trying to put in some async calls to go sequentially to slow it down
d
SSO is higher interface on top of IAM. So they have to make multiple downstream calls to achieve whit one click whatever you have to do with a double IAM with a five clicks
b
what is the difference between standard and adaptive retryMode?
d
Yep that’s the only option you have with SSO
b
Copy code
const permissionSet = pulumi.output(aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            }));
How would you put that in an async call?
d
Pulumi runs all resources in parallel, and there could be multiple requests for each resource. Pulumi should also be handling this rate limit error, so a github issue needs to be raised: https://github.com/pulumi/pulumi-aws/issues
@busy-napkin-83700 see here for their differences: https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html It'd be worth trying
adaptive
b
thanks, will try out adaptive mode, and see if i can downgrade to 6.12 if that still doesnt work
adaptive didnt help
d
There's a couple more things you can try. Try
pulumi up --parallel 4
, which may reduce the amount of concurrent calls to aws. Pulumi runs all resources in parallel by default. You could also change from the
getPermissionSet
function and instead use
.get
methods. This will track the data in state instead of querying aws each time. https://www.pulumi.com/docs/concepts/resources/get/
b
will try that too, what is the default parallel switch that is used?
Copy code
(default 2147483647)
lol
def something with the AWS version and 6.13. running up locally with 6.8 and no errors
but i do see rate limiting in the AWS UI
d
Interesting that you say 6.8. Can you try 6.9? There was a similar issue in the terraform provider 5.25, which pulumi 6.9 is based on
b
with github actions, can i add the parallel to the command?
Copy code
- uses: pulumi/actions@v3
        with:
          color: auto
          command: up --parallel 4
i found the docs
d
Sorry was offline. for a bit. Yeah retry strategy is works in 90 persent of the time. But have to understand that initial surge off calls may go out together
I would recommend build dependency betwenn those calls and resolve them when you need it
it will be safest way with any performance downgrade
let's say you have code
Copy code
const permissionSet = pulumi.output(
  aws.ssoadmin.getPermissionSet({
    instanceArn: config.require("instanceArn"),
    name: "PermissionSet",
  })
);

let otherResource("name", {
    arg: permissionSet.arn
})
otherResource
in this depends on you ps.
highly likely you have list of permissions set and some for loop to create those resources
you can do folowing
Copy code
let psNames = ["PermissionSet", "PermissionSet2", "PermissionSet3"]
let dep = null;
let psList = [];
psNames.forEach(ps => {
    let dependency = [];
    if (dep) {
        depends.push(dep);
    }
    let pulumiPs = aws.ssoadmin.getPermissionSet({instanceArn: config.require("instanceArn"), name: ps}, {dependsOn: dependency})
    psList.push(pulumiPs);
    dep = pulumiPs;
})

let otherResource("name", {
    arg: psList[0].arn
})
Here ^ you are creating logical dependency between ps set fetches
so pulumi need to execute one at the time. first will not have dependency second will be dependent on first one as so on.
Also
Copy code
const permissionSet = pulumi.output(aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            }));
seems liek you resolving this is right away
I am not sure what is the use case you have but usually it done when you passing this property to non pulumi structs.
What i am trying to say using pulumi output is like
await call()
Better to use
call.then(data => {})
Same with pulumi
Copy code
const permissionSet = aws.ssoadmin.getPermissionSet({
                instanceArn: config.require("instanceArn"), 
                name: "PermissionSet",
            });
then using
permissionSet.arn
will make pulumi resolve then values rather then you doing it manually
does that make sense?
d
@dry-potato-52542 Invokes behave differently to resources. There's no dependency chaining support. They're also promises, so can't really be used directly unless you're already in an async context using
await
.
I did some digging, and it sounds like the rate limit error is introduced as hashicorp switches resources to using the aws v2 sdk. https://github.com/hashicorp/terraform-provider-aws/issues/34669 In theory, this rate limit regression for ssoadmin is in tf provider v5.29, which means the pulumi aws provider v6.12 will work fine, with the regression in v6.13
d
Wait are you saying that
dependsOn
does not working? https://www.pulumi.com/docs/concepts/options/dependson/
Also I may miss something but what Control Tower has to do with AWS SSO?
d
dependsOn
isn't supported for functions, only resources
Also I may miss something but Control Tower has to do with AWS SSO?
It's a recently migrated aws api that's showing the same symptoms, a regression from changing the underlying sdk for that resource, which ssoadmin has also just undergone in TF.
d
But SDK is just interface on top of API itself. It doesn't have any inteligence.
It doesn't preform any rate limiting as such.
The error is generate by api itself. I don't think there is relation brtwenn those
d
It's a major change in the code flow, which means the previous rate limiting code that was in place client side may not be effective anymore. The issue needs raising in github for Pulumi, and likely terraform as well.
d
Hm... but again rate limiting cannot be perform on the client side. It's purely AWS SSO Control Plane responsibility. Here is how JS sdk interface looks like - https://github.com/aws/aws-sdk-js/blob/master/clients/ssoadmin.d.ts
Thrilling errors where alway there
I can cough them even with Cloud formation. You can get those errors even with sdk using api directly
SDK is just set of interfaces to help you communicate with APIs nothing more
d
And it's the client that should handle the exception + retry requests when available. The error from the api isn't the problem here, it's the client side that's not handling it correctly that's the regression.
d
You mean code where sdk client used? Like tf handlers?
d
Yep. Or the sdk itself if "adaptive" mode is used
d
Yes that one part. But it has nothing to do with SDK changes.
Those retry was added on propose since AWS SSO knew that people are getting throttled after reaching certain capacity calls.
d
It does if tf migrated to a different sdk. But those are just implementation details for them to work out.
d
@dry-keyboard-94795 Can you bring some examples? How on you opinion AWS SDK inference throttling? I just don't understand your point
d
The adaptive part reminds me, @busy-napkin-83700 during the preview phase, changes to the provider aren't taken into consideration, so setting the mode to adaptive wouldn't have worked. Instead, you could use the environment variable
AWS_RETRY_MODE=adaptive
until
pulumi up
succeeds with the new parameter. Though I know you've worked around the issue by rolling back to an older version
Can you bring some examples? How on you opinion AWS SDK inference throttling? I just don't understand your point
Different SDKs have different implementations. What I'm saying is is that the integration of the new sdk could be incorrect. Maybe because the v1 sdk didn't implement retries, tf had its own implementation that worked, and the v2 sdk doesn't retry as effectively as tf's implementation. https://github.com/hashicorp/terraform-provider-aws/commit/3de914de01c8ae37665b6fa8933c1cbb0699b097 Regardless, the reason behind the error doesn't matter.
d
Ok then this is makes more sense right now. It's not about SDK change. They could keep same version of SDK It's about this line - https://github.com/hashicorp/terraform-provider-aws/blame/e8723a97dc0bca4802b2474e8ea36da46134aac8/internal/service/ssoadmin/service_package.go#L25
If they would would just switch SDK without adding those errors to retryble you would be fine
What's happening now: You have X number of retry on terraform side and Y build in is SDK That's why you end up generating XY call (around 50 tps)
It works like nested loop
I think you need to go to other direction and set sdk retry (since you can control to property) to 0 and let terraform retry on it's own
b
Copy code
retryMode: adaptive
  maxRetries: 20
so set maxRetries to 0?
d
Yeah try that and let terraform handle retry
Ideally you would want sdk manage those retry but it is what it is.