Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qKe Request failed with status code 403 #499

Open
sjribe opened this issue Jan 17, 2024 · 22 comments
Open

qKe Request failed with status code 403 #499

sjribe opened this issue Jan 17, 2024 · 22 comments
Labels
bug Something isn't working

Comments

@sjribe
Copy link

sjribe commented Jan 17, 2024

If your issue relates to the Discovery Process, please first follow the steps described in the implementation guide Debugging the Discovery Component


Describe the bug
when clicking on resources I get the error qKe Request failed with status code 403

To Reproduce
Steps to reproduce the behavior:

  1. when logged in as admin click on resources under explore
  2. error message will appear on top
  3. no resources are discovered

Expected behavior
resources listed

Screenshots
image

Browser (please complete the following information):

reproducible on latest versions of edge and chrome

Additional context
Add any other context about the problem here.

@sjribe sjribe added the bug Something isn't working label Jan 17, 2024
@svozza
Copy link
Contributor

svozza commented Jan 17, 2024

Open up your browser dev tools and paste any errors you see there into this issue.

@sjribe
Copy link
Author

sjribe commented Jan 17, 2024

{
"errors" : [ {
"errorType" : "WAFForbiddenException",
"message" : "403 Forbidden"
} ]
Oh, I think I know now. Where it says "Comma separated list of CIDR ranges to manage access the API. To allow all the entire internet, use 0.0.0.0/1,128.0.0.0/1" what they mean is you should allow the internet because it needs to use the internet?
If that's true what's the best way to go about fixing this without having to redo the whole thing?

@svozza
Copy link
Contributor

svozza commented Jan 17, 2024

Yeah, because the Fargate task speaks to AppSync, it needs to access the internet. If you just update the CFN stack and change that parameter back to 0.0.0.0/1,128.0.0.0/1, it will update it and everything will work.

@sjribe
Copy link
Author

sjribe commented Jan 17, 2024

Yea, easy enough. Thanks.

Error's gone but now no resources discovered... different problem I guess...

@svozza
Copy link
Contributor

svozza commented Jan 17, 2024

The discovery task runs every 15 minutes, so won't run for another 5 minutes (assuming you've deployed the CloudFormation to the various accounts you want to import).

@sjribe
Copy link
Author

sjribe commented Jan 17, 2024

Running it as CrossAccountDiscovery set to AWS_ORGANIZATIONS.
So maybe I have the wrong OrganizationUnitId.
I used the r- value for the root OU but should it be the o- value of the organization?
image

@svozza
Copy link
Contributor

svozza commented Jan 17, 2024

No, the r value will work. Check the ECS logs (don't worry about lambda) for any errors, instructions at think link: https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html.

@sjribe
Copy link
Author

sjribe commented Jan 17, 2024

Thanks. It was the r value and it did discover some resources however going through the debugging I'm getting quite a lot (22 per discovery) of:
{
"error": {
"name": "TooManyRequestsException",
"$fault": "client",
"$metadata": {
"httpStatusCode": 429,
"requestId": "c36e55ed-eb1c-43e9-8415-b1826ee017e0",
"attempts": 4,
"totalRetryDelay": 1856
},
"retryAfterSeconds": null
},
"level": "error",
"message": "Error discovering API Gateway integration for resource: arn:aws:apigateway:us-west-2::/restapis/fqsoha0aq2/resources/89lfy2",
"timestamp": "2024-01-17T23:31:05.569Z"
}

I'm also getting 1:
{
"message": "Access denied assuming role: arn:aws:iam::922409771208:role/WorkloadDiscoveryRole-922409771208. This is the management account, ensure the global resources template has been deployed to the account.",
"level": "error",
"timestamp": "2024-01-17T23:30:37.747Z"
}
But it is true I haven't deployed the global resources template

@sjribe
Copy link
Author

sjribe commented Jan 18, 2024

So some additional information:

  1. In our Audit account (682880543195), resource explorer shows 514 resources.
  2. In our Org account, resource explorer filter to the Audit account it shows 513 resources.
  3. In our Org account, Config Aggregators shows OK for 682880543195 and all the Regions show as OK.
image image
  1. But in Config Aggregators Resources filtered to 682880543195 shows no resources.
image

So it seems like it's connecting fine but it's not discovering anything there. And maybe there's a security option in the account 682880543195 limiting API calls? But I'm not sure where I would look for that.

@svozza
Copy link
Contributor

svozza commented Jan 18, 2024

In AWS_ORGANIZATIONS mode, Workload Discovery does not manage enablement of Config. We leave that down to customers as managing deployment of Config is different for every organization based on what they want to monitor and potential costs incurred by enabling it across a large number of accounts and regions. If one of your accounts doesn't have resources in it then it means Config is either not enabled in any regions in that account or as you mentioned, there is some permission error or SCP that is preventing it from doing so.

@svozza
Copy link
Contributor

svozza commented Jan 18, 2024

Thanks. It was the r value and it did discover some resources however going through the debugging I'm getting quite a lot (22 per discovery) of: { "error": { "name": "TooManyRequestsException", "$fault": "client", "$metadata": { "httpStatusCode": 429, "requestId": "c36e55ed-eb1c-43e9-8415-b1826ee017e0", "attempts": 4, "totalRetryDelay": 1856 }, "retryAfterSeconds": null }, "level": "error", "message": "Error discovering API Gateway integration for resource: arn:aws:apigateway:us-west-2::/restapis/fqsoha0aq2/resources/89lfy2", "timestamp": "2024-01-17T23:31:05.569Z" }

I'm also getting 1: { "message": "Access denied assuming role: arn:aws:iam::922409771208:role/WorkloadDiscoveryRole-922409771208. This is the management account, ensure the global resources template has been deployed to the account.", "level": "error", "timestamp": "2024-01-17T23:30:37.747Z" } But it is true I haven't deployed the global resources template

The API errors are because the the discovery process is being rate limited when it makes SDK calls to the API gateway SDK. API Gateway limits are account wide (rather than regional) so it there a large number of API gateway resources in an account, these sorts of throttling errors are unavoidable.

The IAM error you are seeing is because of the way organization wide StackSets work: they do not allow you to deploy a stack instance to the management account. In AWS_ORGANIZATIONS mode, the deployment process uses StakcSets to deploy the global resources stack on your behalf in all the accounts in your organization. There should be an error dialog box on the Accounts page the Workload Discovery UI that has a link to the template that you can manually deploy in the management account using CloudFormation.

@sjribe
Copy link
Author

sjribe commented Jan 18, 2024

The API errors are because the the discovery process is being rate limited when it makes SDK calls to the API gateway SDK. API Gateway limits are account wide (rather than regional) so it there a large number of API gateway resources in an account, these sorts of throttling errors are unavoidable.

Is this something that AWS support can temporarily increase or lift? It looks like it's stopping at the same point each time so it's not discovering new resources.
Alternatively, if I add each account in manually can I stagger the discovery for each account so as to not trigger the throttle?

The IAM error you are seeing is because of the way organization wide StackSets work: they do not allow you to deploy a stack instance to the management account. In AWS_ORGANIZATIONS mode, the deployment process uses StakcSets to deploy the global resources stack on your behalf in all the accounts in your organization. There should be an error dialog box on the Accounts page the Workload Discovery UI that has a link to the template that you can manually deploy in the management account using CloudFormation.

I installed the template and so that's sorted now.

@svozza
Copy link
Contributor

svozza commented Jan 18, 2024

Is this something that AWS support can temporarily increase or lift? It looks like it's stopping at the same point each time so it's not discovering new resources.

Do you mean the discovery process is crashing? Those throttling errors should only affect API Gateway, they should be skipped over and the process should move on to the next set of resources. Can you attach the ECS logs here so I can have a look?

@sjribe
Copy link
Author

sjribe commented Jan 18, 2024

I don't know if the process is crashing but I do know not all of my resources are being discovered. In the account mentioned before each region shows "Not Discovered" but I know that account has 514 resources across 18 regions according to resource explorer. Or are there default resources in each region and the discovery process is filtering them out? I've attached the ECS logs for the most recent discovery job.
log-events-viewer-result.csv

@svozza
Copy link
Contributor

svozza commented Jan 18, 2024

The discovery process in not crashing but It looks like there are only 1734 resources in the entire aggregator, that seems very low for an organization wide aggregator. When you say 'resource explorer', do you mean the service or do you mean the resource section in the AWS Config console page? Can you go to the aggregator that WD deployed (it will be called aws-perspective-<wd-region>-<wd-account-id>-aggregator and run the following query in the advanced queries section:

SELECT * WHERE accountId = '<account-id-with-514 resources'

Make sure the query scope is the aggregator as per the screenshot:
Screenshot 2024-01-18 at 23 29 50

What results do you see when you run the query?

@sjribe
Copy link
Author

sjribe commented Jan 18, 2024

Yes, the service AWS resource explorer. This is viewing the account 682880543195
image

Looks like it has no output.
image

@svozza
Copy link
Contributor

svozza commented Jan 19, 2024

The results of the SQL query means it looks like the issue is that AWS Config is not enabled in any regions in that account. Try enabling it in us-east-1 of 682880543195 and you should see IAM roles and and a few other global resource types when you run that query again (note that it can take several minutes for Config to find the resources after enablement).

If Config doesn't know about a resource there's no way for WD to discover it as we get 90% of our resources from their APIs (under the hood we also use the SQL syntax you are using there for your ad hoc query).

@sjribe
Copy link
Author

sjribe commented Feb 14, 2024

Thanks. That's showing up now. Does AWS Config need to be enabled in every region in use or only one per account? For 682880543195 us-east-1 and ap-southeast-2 are in use.

@svozza
Copy link
Contributor

svozza commented Feb 14, 2024

Yeah, it needs to be enabled in each region you're interesting in.

@sjribe
Copy link
Author

sjribe commented Feb 16, 2024

Great. That's solved most of my problems!
I do have one account (949247560096) which I've enabled config on all 17 regions enabled on that account. However the discovery is only resources in 3 regions and the other regions it's saying "Not Discovered" like when config was not enabled in that region.
Do you know why that would be?

@svozza
Copy link
Contributor

svozza commented Feb 16, 2024

That's strange. Are there any errors in the discovery process logs?

@sjribe
Copy link
Author

sjribe commented Feb 29, 2024

I think I've sorted it. I did find out that Config was not enabled on the other regions but that the admin account for some reason can't add it to those regions. I've also realized there's only the default stuff in those regions without config so at the moment not necessary.

Is there a way to filter out the default resources?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants