{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":129443413,"defaultBranch":"main","name":"atlas-system-agent","ownerLogin":"Netflix-Skunkworks","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2018-04-13T19:10:32.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1728142?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1648746415.8056488","currentOid":""},"activityList":{"items":[{"before":"e2732ef5befcb53adbaf39a67a5fa0ce29eeef35","after":"b127928febfbe645c57ad97ec97d7680c1fa7dc7","ref":"refs/heads/main","pushedAt":"2024-05-24T15:04:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"adjust setup-venv.sh script (#130)","shortMessageHtmlLink":"adjust setup-venv.sh script (#130)"}},{"before":"d3ee30660b4e6f451f469fe61ede7d30dadd6194","after":"e2732ef5befcb53adbaf39a67a5fa0ce29eeef35","ref":"refs/heads/main","pushedAt":"2024-05-23T18:51:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"latest conan 1.x (#129)","shortMessageHtmlLink":"latest conan 1.x (#129)"}},{"before":"ae234dcf8c6d4ccc916680c2594db1f706d11ab2","after":"d3ee30660b4e6f451f469fe61ede7d30dadd6194","ref":"refs/heads/main","pushedAt":"2024-05-23T16:44:01.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"extract functions for avail_cpu_time and num_cpu (#128)\n\nTo help ensure consistency between normal and peak cpu metrics.","shortMessageHtmlLink":"extract functions for avail_cpu_time and num_cpu (#128)"}},{"before":"9f4d4c334b49a730dad27db79361b11cbe5732a0","after":"ae234dcf8c6d4ccc916680c2594db1f706d11ab2","ref":"refs/heads/main","pushedAt":"2024-05-23T15:21:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"fix sys.cpu.utilization metrics for titus isolate v2 (#127)\n\nThe new 7th generation hardware sets a `cfs_quota` of `max`, which is then read\r\nas a `0` in the number vector. This was breaking utilization calculations.\r\n\r\n* Refactor code to remove all remaining traces of cgroup selection, since all\r\nTitus hosts are now on cgroupv2.\r\n* Remove extra `ifdef` macro for `TITUS_SYSTEM_SERVICE` that had no effect.\r\n* The `TITUS_NUM_CPU` environment variable is expected to always be present, and\r\nshould always be used to determine the number of CPUs allocated.\r\n* The number of CPUs can be used to determine the value of `cfs_quota`, which\r\ncan then be used to calculate available CPU time.\r\n* Refactor metrics collection for cpu time and cpu utilization, to reduce the\r\nnumber of duplicate calls and simplify the processing.\r\n* Shuffle the order of nvml library initialization, so that we can use the\r\nstandard logger to report status.\r\n* Refactor the `setup-venv.sh` script, to make it easier to bootstrap new dev\r\nenvironments on bionic hosts.","shortMessageHtmlLink":"fix sys.cpu.utilization metrics for titus isolate v2 (#127)"}},{"before":"3b38838650bce12a309c6a2039a5eba113fb0f68","after":"9f4d4c334b49a730dad27db79361b11cbe5732a0","ref":"refs/heads/main","pushedAt":"2024-02-29T13:31:36.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"remove cgroupv1 collection methods (#126)\n\nAll Titus hosts are now running on Jammy with cgroupv2. This change keeps the\r\ncgroupv2 test and boolean flags, to ensure that data collection is valid, and\r\nas a hedge against the development of cgroupv3. Any instances that are not\r\nrunning with cgroupv2 with will not report any cgroup metrics.","shortMessageHtmlLink":"remove cgroupv1 collection methods (#126)"}},{"before":"71d1e823abfed7256ce597d7101d4c68b94736d2","after":"3b38838650bce12a309c6a2039a5eba113fb0f68","ref":"refs/heads/main","pushedAt":"2024-02-23T22:36:37.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"add linux pressure stall metrics (#125)\n\nThis change adds the following new metrics, which can be used to provide\r\nfeedback on where a system is currently constrained. The metrics are\r\ncollected for both EC2 instances and Titus containers, except the `full:cpu`\r\nmetric, which is meaningless on EC2 instances.\r\n\r\nEC2 instances:\r\n\r\n```\r\nname=sys.pressure.some,id=[cpu|io|memory] counter unit=seconds/second\r\nname=sys.pressure.full,id=[io|memory] counter unit=seconds/second\r\n```\r\n\r\nTitus comtainers:\r\n\r\n```\r\nname=sys.pressure.some,id=[cpu|io|memory] counter unit=seconds/second\r\nname=sys.pressure.full,id=[cpu|io|memory] counter unit=seconds/second\r\n```\r\n\r\nhttps://docs.kernel.org/accounting/psi.html#pressure-interface\r\n\r\n> The \"some\" line indicates the share of time in which at least some tasks are\r\n> stalled on a given resource.\r\n\r\n> The \"full\" line indicates the share of time in which all non-idle tasks are\r\n> stalled on a given resource simultaneously. In this state actual CPU cycles\r\n> are going to waste, and a workload that spends extended time in this state\r\n> is considered to be thrashing.\r\n\r\n> The total absolute stall time (in us) is tracked and exported as well, to\r\n> allow detection of latency spikes which wouldn't necessarily make a dent in\r\n> the time averages, or to average trends over custom time frames.\r\n\r\nThe `total` stall time is a monotonic counter which is collected, transformed\r\ninto a base unit of seconds, and reported to the backend as a rate-per-second.","shortMessageHtmlLink":"add linux pressure stall metrics (#125)"}},{"before":"9d5a98e29972da36a4b5c90281a816871fa017d8","after":"71d1e823abfed7256ce597d7101d4c68b94736d2","ref":"refs/heads/main","pushedAt":"2024-02-13T19:38:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"publish `TITUS_NUM_CPU` value as `titus.cpu.requested` (#124)\n\nTitus is moving away from setting the `cgroup.cpu.shares` value. The shares is\r\nnot the same as the number of cpus, although it is often incorrectly used that\r\nway.\r\n\r\nIt is not a good cgroup field to use for a metric, because it is something that\r\nwe optionally set, rather than something system-provided, as is the case for\r\nevery other cgroup metric. The `cgroup.cpu.processingCapacity` metric is better\r\nin most cases, but some people want to know the requested cpu count.\r\n\r\nThis change reports the `TITUS_NUM_CPU` environment variable as the\r\n`titus.cpus.requested` metric.","shortMessageHtmlLink":"publish TITUS_NUM_CPU value as titus.cpu.requested (#124)"}},{"before":"8873127b9cb17e44def3c3ece4d43856bb820b77","after":"9d5a98e29972da36a4b5c90281a816871fa017d8","ref":"refs/heads/main","pushedAt":"2024-02-08T01:15:31.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"update build script (#123)\n\nThe clean function of the build script was removing the conan binary cache far\r\ntoo often, resulting in slower builds. This should be a special action that is\r\nonly done when switching between Debug and Release builds, where there is an\r\nimpact on the use of the address sanitizer.","shortMessageHtmlLink":"update build script (#123)"}},{"before":"9c7aeaf486b551259ee2d1ccfe5dcc3c466c8da1","after":"8873127b9cb17e44def3c3ece4d43856bb820b77","ref":"refs/heads/main","pushedAt":"2024-02-02T00:12:28.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"add net.perf.conntrackAllowanceAvailable metric (#121)\n\nIn a recent incident, the net.perf.conntrackAllowanceExceeded was a strong\r\nsignal for the cause. Adding the conntrackAllowanceAvailable metric will\r\nallow it to be used as an input to auto-scaling policies.","shortMessageHtmlLink":"add net.perf.conntrackAllowanceAvailable metric (#121)"}},{"before":"d1e33deb916d29d3fe217f226a3a3b5f68e19d82","after":"9c7aeaf486b551259ee2d1ccfe5dcc3c466c8da1","ref":"refs/heads/main","pushedAt":"2023-12-11T22:12:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"libcurl 8.4.0 (#120)\n\nFixes #118.","shortMessageHtmlLink":"libcurl 8.4.0 (#120)"}},{"before":"47f65a2a27b405e709d56ce09a8f7bd33231cb06","after":"d1e33deb916d29d3fe217f226a3a3b5f68e19d82","ref":"refs/heads/main","pushedAt":"2023-12-11T21:49:01.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"update dependencies (#119)\n\nThis change updates to the latest Conan 1.x, and updates the project\r\ndependencies. The Address Sanitizer is also properly engaged for the\r\nGitHub Actions, although it slows the build down a fair bit, due to\r\nthe need to recompile all dependencies.","shortMessageHtmlLink":"update dependencies (#119)"}},{"before":"6759013d5bfda0935498e2f24e936daf60fd67c2","after":"47f65a2a27b405e709d56ce09a8f7bd33231cb06","ref":"refs/heads/main","pushedAt":"2023-05-31T19:03:27.816Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"ensure CGROUP2_SUPER_MAGIC is defined (#115)\n\nWe build on such old versions of CentOS that this definition may not exist.","shortMessageHtmlLink":"ensure CGROUP2_SUPER_MAGIC is defined (#115)"}},{"before":"c7e066fe59ae148cec7894ea387155043cfc48ae","after":"6759013d5bfda0935498e2f24e936daf60fd67c2","ref":"refs/heads/main","pushedAt":"2023-05-31T18:26:20.896Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"cgroupv2 system metrics (#114)\n\nThis change adds support for cgroupv2 system metrics for containers, which\r\nis introduced in the Ubuntu Jammy release. The super magic number of\r\n`/sys/fs/cgroup` is checked to determnine whether or not the system supports\r\ncgroupv2.\r\n\r\nAll of the functions which returned metrics for cgroupv1 were duplicated\r\nwhen there were cgroupv2 equivalents, and the switch between the versions\r\nhappens at the entry points in the code.","shortMessageHtmlLink":"cgroupv2 system metrics (#114)"}},{"before":"61184c2c547ad324fb922d0eb9e2ecd2f1275ce0","after":"c7e066fe59ae148cec7894ea387155043cfc48ae","ref":"refs/heads/main","pushedAt":"2023-04-21T17:35:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"exclude /mnt/docker mount points from disk metrics (#113)\n\nWhen this runs on Titus Agent hosts, there are a very large number of\r\nthese metrics reporting (>9M), and we want to remove them. The containers\r\nthemselves will continue to report disk metrics through their own agent\r\nprocess.","shortMessageHtmlLink":"exclude /mnt/docker mount points from disk metrics (#113)"}},{"before":"21f63b7b62c35ce7ae4aaa9310215aa893d4f461","after":"61184c2c547ad324fb922d0eb9e2ecd2f1275ce0","ref":"refs/heads/main","pushedAt":"2023-04-05T23:54:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"add ena driver metrics (#112)\n\nThis change adds support for the following five Elastic Network Adapter\r\nperformance metrics, which are available on EC2 instances:\r\n\r\n* bw_in_allowance_exceeded\r\n* bw_out_allowance_exceeded\r\n* conntrack_allowance_exceeded\r\n* linklocal_allowance_exceeded\r\n* pps_allowance_exceeded\r\n\r\nThere are a few apps in our environment which may hit conntrack limits, and\r\nthese metrics are a good means of detecting that state. All of these metrics\r\nare configured as Monotonic Counters, since all of the statistics report the\r\ncumulative number of packets queued or dropped on each network interface since\r\nthe last driver reset.\r\n\r\nThe first time that metrics collection occurs, the ethernet interfaces on the\r\ninstance will enumerated, so that these statistics may be collected for each\r\none.","shortMessageHtmlLink":"add ena driver metrics (#112)"}},{"before":"085dbee001653ca787a11db8f8f05c94a2e6014f","after":"21f63b7b62c35ce7ae4aaa9310215aa893d4f461","ref":"refs/heads/main","pushedAt":"2023-03-22T02:43:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"copperlight","name":"Matthew Johnson","path":"/copperlight","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1192501?s=80&v=4"},"commit":{"message":"conan conversion (#111)\n\nThis change converts the build from Bazel to Conan, and removes the\r\nTITUS_AGENT conditionals, which have not been in use for some time.","shortMessageHtmlLink":"conan conversion (#111)"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEUymtPQA","startCursor":null,"endCursor":null}},"title":"Activity ยท Netflix-Skunkworks/atlas-system-agent"}