DevOps E / SRE 업무를 하면서 전문성을 갖추기 위하여 공부한 자료를 업로드하는 공간입니다. 개인적인 공부이지만 참고할 부분이 될 수 있었으면 좋겠습니다.
-
Updated
Dec 12, 2022 - Go
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
DevOps E / SRE 업무를 하면서 전문성을 갖추기 위하여 공부한 자료를 업로드하는 공간입니다. 개인적인 공부이지만 참고할 부분이 될 수 있었으면 좋겠습니다.
Dev environment for SRE
A .Net Standard library for working with the Uptime Robot API.
Maia is a CLI that allows you to execute remote commands on multiple machines at once.
I'm a Professional Mistake Avoider, a.k.a. Strategic Advisor.
Script to monitor the Azure Traffic Manager service.
A combination of introduction to operating system and computer network
Overall map of topics to cover for my “Engineering for Site Reliability” blog series.
Repository showing how PagerDuty can be managed using Terraform, Terraform Cloud as a remote backend and GitHub actions for a CI/CD pipeline
This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.
👨💻 blog with github pages | About SRE
A resource website dedicated to Reliability Engineering
🧪 Tutorials for running chaos experiments with litmus chaos, chaos mesh, and gremlin (includes k8s setup)
Great resources for learning Software and Site Reliability Engineering.
gremlin chaos engineering
Roadmap (Data/ML/AI/Cloud/DevOps)
Keep Kubernetes Deployments up-to-date with the `latest` container images