Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch - logs from lambda/update_lease_status reporting incorrect cost #340

Open
rafabnunes opened this issue Apr 15, 2020 · 9 comments
Labels
bug Something isn't working

Comments

@rafabnunes
Copy link

rafabnunes commented Apr 15, 2020

Version information

  • The version of DCE > v0.29.0
  • OS and version - MAC > Darwin Kernel Version 19.3.0:
  • Go version - 1.0
  • Terraform version (if using directly)
    Terraform v0.12.24 / + provider.archive v1.3.0 / + provider.aws v2.43.0 / + provider.template v2.1.2

Describe the bug
When I receive the notification budget I see a cost different from AWS billing. According to information bellow from CloudWatch logs, the user has spent $9.03, but I saw in the AWS Billing a value above. My question is, DCE can get all information regarding the cost from AWS Billing ?
I've deployed in the lease account the services such as EC2 and VPC Endpoints.

Another scenario, when leased account achieve the budget limit, DCE would do reset account, but according to the information bellow from CloudWatch logs, account has achieved the limit but does not performed an account reset.

-"2020/04/15 15:39:27 Principal dcepocuser has spent $14.44 of their current principal budget

  • 2020/04/15 15:39:27 OverBudget. Updating lease as ready to be reclaimed...

I also checked the logs from code build, but there's no information about reset from any account in the date mentioned above.

How can I fix this issue ?

@eschwartz
Copy link
Contributor

Hi @rafabnunes -- we're aware of some issues in the way cost/usage are being reported in the system today. We're actually in the middle of a big overhaul of cost reporting (see #316), that will hopefully address any issues we're seeing.

I apologize that we haven't better documented some of the known buggy behavior here. I'm still trying to get a handle on all of the moving parts myself. But I intend to document the current situation in more detail for posterity.

Note that we're probably looking at some significant breaking changes with PR #316 (eg. it will destroy and recreate the usage DB table), so if this is a feature you're relying on, it may be worthwhile to wait for that release to land before investing too heavily in your current setup.

@eschwartz eschwartz added the bug Something isn't working label Apr 16, 2020
@rafabnunes
Copy link
Author

Hey @eschwartz thanks for all information.

According to PR #316 it seems like will solve this issue.

My current setup is proof of concept, I might wait a little bit for the new version.

@eschwartz
Copy link
Contributor

Sure @rafabnunes , I think that's a wise path.

I will update this ticket when the PR is merged and released. Hopefully it's won't be too long here now....

@rafabnunes
Copy link
Author

Hey @eschwartz

Just to make sure that you mentioned before, the issue bellow will be fixed with DCE new version ?

"When leased account achieve the budget limit, DCE would do reset account, but according to the information bellow from CloudWatch logs, account has achieved the limit but does not performed an account reset.

-"2020/04/15 15:39:27 Principal dcepocuser has spent $14.44 of their current principal budget

2020/04/15 15:39:27 OverBudget. Updating lease as ready to be reclaimed...
I also checked the logs from code build, but there's no information about reset from any account in the date mentioned above."

@rafabnunes
Copy link
Author

Hi @eschwartz

I did a new test leasing a new account. I've deployed a few resources like EC2 and VPC Endpoints, after few hours the budget has exceeded, the account was changed to "OverBudget" but does not performed the reset process by code build. It seems like Lambda "Update lease status" does not push the event "OverBudget" to SQS "Reset queue".

To enforce the cleanup in the account, I've changed manually the account status to "NotReady" at DynamoDB. After that, the Lambda "Process reset queue" was able to trigger codebuild "Reset AWS codebuild" and hence the account was cleaned and came back to the pool with status "Ready".

Since I haven't found any errors at CloudWatch logs. Do you have any idea to fix this issue ?
Screen Shot 2020-04-24 at 10 18 18

Screen Shot 2020-04-24 at 12 39 52

@rafabnunes
Copy link
Author

Hey @eschwartz
I noticed that was released the version 0.30.1. I've already deployed, but I continue facing the issue. When an account become "OverBudget", the process to reset or cleanup account is not performed.
Do you have any idea or tip to solve this issue ?
Screen Shot 2020-04-28 at 14 18 47

@eschwartz
Copy link
Contributor

@rafabnunes I want to let you know that I'm moving off the DCE team at Optum this week, so I want to pass this PR off @robologic to shepherd through.

@robologic can you take the ball on this one, please? I'm hoping this will all be resolved by #302 , but it's worth a follow-up

@rafabnunes
Copy link
Author

@eschwartz thanks for everything and good luck !

@robologic it seems like the lambda "Update Leases" after update DynamoDB table, unable to send information from account with status "OverBudget" to SQS.

@rafabnunes
Copy link
Author

Hi @robologic,

I've uninstalled DCE version 0.30.1 and installed the version according to Feature/usage2.0 #302.
After some tests with leased accounts, the lambda "end_over_budget_lease" was performed and triggered SQS and hence the account was cleaned and came back to the pool with status "Ready".

Thanks a lot.

Rafael Nunes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants