Skip to content

aws-samples/operational-insights-using-amazon-devops-guru-innovate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Gaining operational insights with AIOps using Amazon DevOps Guru

This lab is provided as part of AWS Innovate AI/ML and Data Edition.

ℹ️ You will run this lab in your own AWS account. Please follow directions at the end of the lab to remove resources to avoid future costs.

ℹ️ Please let us know what you thought of this session and how we can improve the experience for you in the future by completing the survey at the end of the lab. Participants who complete the surveys from AWS Innovate Online Conference will receive a gift code for USD25 in AWS credits 1, 2 & 3 .

Overview

Amazon DevOps Guru offers a fully managed AIOps platform service that enables developers and operators to improve application availability and resolve operational issues faster. It minimizes manual effort by leveraging machine learning (ML) powered recommendations. DevOps Guru automatically detects operational issues, predicts impending resource exhaustion, details likely causes, and recommends remediation actions.

This is a labified version of this AWS Blog

It will walk you through how to enable DevOps Guru for your account in a typical serverless environment and observe the insights and recommendations generated for various activities. These insights are generated for operational events that could pose a risk to your application availability. DevOps Guru uses AWS CloudFormation stacks as the application boundary to detect resource relationships and co-relate with deployment events.

Setup

Create Cloud9 environment via AWS CloudFormation

  1. Log in your AWS Account
  2. Click this link and open a new browser tab
  3. Click Next again to the stack review page, tick I acknowledge that AWS CloudFormation might create IAM resources box and click Create stack. Acknowledge Stack Capabilities
  4. Wait for a few minutes for stack creation to complete.
  5. Select the stack and note down the outputs (Cloud9EnvironmentId & InstanceProfile) on outputs tab for next step. Cloud9 Stack Output

Assign instance role to Cloud9 instance

  1. Launch AWS EC2 Console.
  2. Use stack output value of Cloud9EnvironmentId as filter to find the Cloud9 instance. Locate Cloud9 Instance
  3. Right click the instance, Security -> Modify IAM Role.
  4. Choose the profile name matches to the InstanceProfile value from the stack output, and click Apply. Set Instance Role

Disable Cloud9 Managed Credentials

  1. Launch AWS Cloud9 Console

  2. Locate the Cloud9 environment created for this lab and click "Open IDE". The environment title should start with DevOpsGuruCloud9.

  3. At top menu of Cloud9 IDE, click AWS Cloud9 and choose Preferences.

  4. At left menu AWS SETTINGS, click Credentials.

  5. Disable AWS managed temporary credentials:

    Disable Cloud 9 Managed Credentials

Install prerequisite packages

Run commands below on Cloud9 Terminal to install prerequisite packages:

sudo yum install jq -y

export AWS_REGION=$(curl -s \
169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')

sudo pip3 install requests

Install Packages

Clone the git repository to download the required CloudFormation templates:

git clone https://github.com/aws-samples/amazon-devopsguru-samples
cd amazon-devopsguru-samples/generate-devopsguru-insights/

Deploy Serverless Infrastructure

As depicted in the following diagram, we use a CloudFormation stack to create a serverless infrastructure, comprising of Amazon API Gateway, AWS Lambda, and Amazon DynamoDB, and inject HTTP requests at a high rate towards the API published to list records.

architecture

Deploy the CloudFormation template using the following command in Cloud 9:

aws cloudformation create-stack --stack-name myServerless-Stack \
--template-body file:///$PWD/cfn-shops-monitoroper-code.yaml \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--region ${AWS_REGION}

The AWS CloudFormation deployment creates an API Gateway, a DynamoDB table, and a Lambda function with sample code. When it’s complete, go to the AWS CloudFormation Console, select stack named myServerless-Stack, and choose Outputs tab of the stack. Record the links for the two APIs: one of them to list the table contents and other one to populate the contents. stack-output

Populating the DynamoDB table

Run the following commands (simply copy-paste) to populate the DynamoDB table. The below three commands will identify the name of the DynamoDB table from the CloudFormation stack and populate the name in the populate-shops-dynamodb-table.json file.

dynamoDBTableName=$(aws cloudformation list-stack-resources \
--stack-name myServerless-Stack \
--region ${AWS_REGION} | \
jq '.StackResourceSummaries[]|select(.ResourceType == "AWS::DynamoDB::Table").PhysicalResourceId' | tr -d '"')
sudo sed -i s/"<YOUR-DYNAMODB-TABLE-NAME>"/$dynamoDBTableName/g \
populate-shops-dynamodb-table.json
aws dynamodb batch-write-item \
--request-items file://populate-shops-dynamodb-table.json --region ${AWS_REGION}

This populates the DynamoDB table with a few sample records, which you can verify by accessing the ListRestApiEndpointMonitorOper API URL published on the Outputs tab of the CloudFormation stack. The following screenshot shows the output.

listrest

Lab

Enable DevOps Guru for the CloudFormation Stack

Run the command in Cloud9 to enable DevOps Guru for this CloudFormation stack:

aws cloudformation create-stack \
--stack-name EnableDevOpsGuruForServerlessCfnStack \
--template-body file:///$PWD/EnableDevOpsGuruForServerlessCfnStack.yaml \
--parameters ParameterKey=CfnStackNames,ParameterValue=myServerless-Stack \
--region ${AWS_REGION}

When the stack is created, navigate to the Amazon DevOps Guru console. Choose Settings. Under CloudFormation stacks, locate myServerless-Stack.

If you don’t see it, your CloudFormation stack has not been successfully deployed. You may remove and redeploy the EnableDevOpsGuruForServerlessCfnStack stack.

Waiting for baselining of resources

This is a necessary step to allow DevOps Guru to complete baselining the resources and benchmark the normal behavior. For our serverless stack with 3 resources, we recommend waiting for 2 hours before carrying out next steps.

Proactive Insight from DevOps Guru for AWS Best Practice

From the DevOps Guru console, view the Dashboard and at the top you will see an ongoing proactive insight. You can click on the button to navigate to the insight dashboard. See the screenshot below for what that looks like.

db-proactive

After you click on that, you are taken to the DevOps Guru console which displays the Insights. If you clicked on the Proactive insight button, you'll automatically be taken to the Proactive insight tab. You should see the following ongoing proactive insight for enabling Point In Time Recovery in your DynamoDB table. Click on the insight link to take a deeper look into what's going on (for reference, this is the bottom of the screenshot).

dg-proactivepage

See the top of the insight details describing a summary of what DevOps Guru has detected.

dg-proactivedetails

Scroll down on this page or click view recommendations to see what to do to fix it.

dg-proactiverecommendations

DevOps Guru provides a recommendation of enabling Point In Time Recovery as this is an AWS best practice for production database tables. It provides you a link to the documentation for the next steps. For example, you can configure this setting in the console, via API or using Infrastructure as Code such as CloudFormation. For purposes of today's workshop, let's make this fix in the console. Go to the search bar and type DynamoDB. From the DynamoDB console page, on the left, click on Tables. You should see something like this.

ddb-tableview

Click on the table. If there are multiple tables in your environment, click on the particular DynamoDB table that is referenced in the DevOps Guru insight/recommendation page. It should start with "myServerless-Stack." After clicking on the table, navigate to the Backups table and you will see Point-in-time-recovery is currently disabled.

dg-ddbbackup

Click the edit button, this brings you to the page below where you can check the Enable point-in-time-recovery checkbox and save changes.

07_DynamoDB_EnablePITR

Updating the CloudFormation stack

We will make a configuration change to simulate a typical operational event. In Cloud 9, update update the CloudFormation template cfn-shops-monitoroper-code.yaml in folder generate-devopsguru-insights to change the read capacity for the DynamoDB table from 5 to 1.

read-capacity

Run the following command to deploy the updated CloudFormation template:

aws cloudformation update-stack --stack-name myServerless-Stack \
--template-body file:///$PWD/cfn-shops-monitoroper-code.yaml \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--region ${AWS_REGION}

Injecting HTTP requests into your API

In the sendAPIRequest.py, populate the url parameter with the correct API link for the ListRestApiEndpointMonitorOper API, listed in the CloudFormation stack’s output tab.

api-pre

api-after

After the script is saved, trigger the script using the following command:

python sendAPIRequest.py

After approximately 10 minutes of the script running in a loop, an operational insight is generated in DevOps Guru.

Review DevOps Guru Reactive Insight

While we keep sending HTTP requests to our API endpoint, DevOps Guru monitors for anomalies, logs insights that provide details about the metrics indicating an anomaly, and prints actionable recommendations to mitigate the anomaly. In this section, we review some of these insights.

Under normal conditions, DevOps Guru dashboard will show the ongoing insights counter to be zero. It monitors a high number of metrics behind the scenes and offloads the operator from manually monitoring any counters or graphs. It raises an alert in the form of an insight, only when anomaly occurs.

The following screenshot shows an ongoing reactive insight for the specific CloudFormation stack. When you choose the insight, you see further details. The number under the Total resources analyzed last hour may vary, so for this workshop, you can ignore this number.

dashboard-1

  1. Aggregated metrics The following screenshot shows the Aggregated metrics section, where it shows metrics for all the resources that it detected anomalies in (DynamoDB, Lambda, and API Gateway). This helps us understand the cause and symptoms, and prioritize the right anomaly investigation.

dynamodb-metrics

Initially, you may see only two metrics listed, however, over time, it populates more metrics that showed anomalies. You can see the anomaly for DynamoDB started earlier than the anomalies for API Gateway and Lambda, thus indicating them as after effects. In addition to the information in the preceding screenshot, you may see Duration p90 and IntegrationLatency p90 (for Lambda and API Gateway respectively, due to increased backend latency) metrics also listed.

  1. Relevant events Now we check the Relevant events section, which lists potential triggers for the issue. The events listed here depend on the set of operations carried out on this CloudFormation stack’s resources in this Region. This makes it easy for the operator to be reminded of a change that may have caused this issue. The dots (representing events) that are near the Insight start portion of timeline are of particular interest.

dynamodb-events2

If you need to delve into any of these events, just click on any of these points, and it provides more details as shown in the screenshot below.

dynamodb-events-updatestack

You can choose the link for an event to view specific details about the operational event (configuration change via CloudFormation update stack operation).

  1. Recommendations Navigate to Recommendations section, it's where Amazon DevOps Guru provides recommendations on how to remediate issues. As seen in the following screenshot, it recommends to roll back the configuration change related to the read capacity for the DynamoDB table. It also lists specific metrics and the event as part of the recommendation for reference.

dynamodb-recommendations2

You can follow the recommendations to remediate the issues

Conclusion

Throughout this lab, you have learn how to enable Amazon DevOps Guru for your account in a typical serverless environment and observe the insights and recommendations generated for various activities.

As described in the AWS Well-Architected Framework,it recommends to regularly analyze the collected metrics to proactively identify and mitigate issues before they affect business outcomes.

Visit the AWS Well-Architected Framework white paper and below related Best Practices for more information:

Survey

Let us know what you thought of this session and how we can improve the presentation experience for you in the future by completing this event session poll. Participants who complete the surveys from AWS Innovate Online Conference will receive a gift code for USD25 in AWS credits 1, 2 & 3 . AWS credits will be sent via email by March 29, 2024. Note: Only registrants of AWS Innovate Online Conference who complete the surveys will receive a gift code for USD25 in AWS credits via email.

1AWS Promotional Credits Terms and conditions apply: https://aws.amazon.com/awscredits/

2Limited to 1 x USD25 AWS credits per participant.

3Participants will be required to provide their business email addresses to receive the gift code for AWS credits.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published