Skip to content

Latest commit

 

History

History
233 lines (173 loc) · 12.7 KB

aws.md

File metadata and controls

233 lines (173 loc) · 12.7 KB

eks

给node保留cpu内存

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#kube-reserved https://eksctl.io/usage/customizing-the-kubelet/

注意只能是不managed了。有方法github弄managed,比较麻烦

eksctl

fluentbit

log原始日志:

{"log":"{\"time\":\"2022-12-08T03:00:58.187+00:00\", \"level\":\"Trace\", \"pid\":1, \"cid\":\"68093n81\", \"message\":\"TCP: disposing #0 resource(HttpStream)(0x56484a6a4c30), conns=1, disposing=1, zombies=0\"}\n","stream":"stdout","time":"2022-12-08T03:00:58.187782553Z"}

k8s使用

  • 暴露的service访问不了
    • 登录虚机可以curl
    • 过了一会就能进了,估计是service生效有时间?
  • statefulsset 的pvc是有磁盘属性的,也就是说一个pod,如果在A机器上启动,那就一直在A了,无法调配,需要开启feature gate,才能自动删除ebs

nlb

  • nlb -> targetgroupbinding -> nodeport
  • 遇到health check不通,看一看nodegroup的安全组,安全组端口要通,加192.168 172.x.x.x

cloudwatch

将所需的策略附加到 Worker 节点的 IAM 角色

通过以下网址打开 Amazon EC2 控制台:https://console.aws.amazon.com/ec2/。

选择其中的一个 Worker 节点实例,然后在描述中选择 IAM 角色。

在 IAM 角色页面上,选择 Attach policies(附加策略)。

在策略列表中,选中 CloudWatchAgentServerPolicy 旁边的复选框。如有必要,请使用搜索框查找该策略。

选择 Attach policies(附上策略)。

对等连接

挂载efs

  • created IAM Open ID Connect provider: eksctl utils associate-iam-oidc-provider --cluster prod-k8s --approve
  • policy:
aws iam create-policy \
    --policy-name AmazonEKS_EFS_CSI_Driver_Policy \
    --policy-document file://iam-policy-example.json`
  • iam sa
eksctl create iamserviceaccount \
    --name efs-csi-controller-sa \
    --namespace kube-system \
    --cluster prod-k8s \
    --attach-policy-arn arn:aws:iam::802625923695:policy/AmazonEKS_EFS_CSI_Driver_Policy \
    --approve \
    --override-existing-serviceaccounts \
    --region us-east-1
  • helm install
helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \
    --namespace kube-system \
    --set image.repository=602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-efs-csi-driver \
    --set controller.serviceAccount.create=false \
    --set controller.serviceAccount.name=efs-csi-controller-sa

fsx

  • 注意,resouce:s3 改为 * 就行了!!!!!

ebs

prometheus

image

ca

  • 曾经在控制台把最大机器改成了1,扩容不行,改成10,ca也不行,重启pod好了。

一次pod terminating的问题

kubernetes/autoscaler#4966, 排查,kubelet log:

Jun 15 06:55:57 ip-192-168-44-229.ec2.internal kubelet[4195]: I0615 06:55:57.895419    4195 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"volume-hls-disk\" (UniqueName: \"kubernetes.io/csi/fsx.csi.aws.com^fs-000f8f664d17faf82\") pod \"b2a61441-8129-4372-946c-cea170ae9979\" (UID: \"b2a61441-8129-4372-946c-cea170ae9979\") "
Jun 15 06:55:57 ip-192-168-44-229.ec2.internal kubelet[4195]: E0615 06:55:57.895504    4195 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/fsx.csi.aws.com^fs-000f8f664d17faf82 podName:b2a61441-8129-4372-946c-cea170ae9979 nodeName:}" failed. No retries permitted until 2022-06-15 06:57:59.895477052 +0000 UTC m=+84822.216273975 (durationBeforeRetry 2m2s). Error: "UnmountVolume.TearDown failed for volume \"volume-hls-disk\" (UniqueName: \"kubernetes.io/csi/fsx.csi.aws.com^fs-000f8f664d17faf82\") pod \"b2a61441-8129-4372-946c-cea170ae9979\" (UID: \"b2a61441-8129-4372-946c-cea170ae9979\") : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name fsx.csi.aws.com not found in the list of registered CSI drivers"
Jun 15 06:56:16 ip-192-168-44-229.ec2.internal kubelet[4195]: WARNING: 2022/06/15 06:56:16 grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/fsx.csi.aws.com/csi.sock  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/fsx.csi.aws.com/csi.sock: connect: connection refused". Reconnecting...

果然是,pod要detach pvc不行,为啥呢,因为之前把fsx的按照搞过一次,有些估计不行了

argo, crossplane alpha1

❯ kubectl crossplane install configuration registry.upbound.io/xp/getting-started-with-aws:v1.10.1

kubectl crossplane: error: failed to create kube client: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1"
  1. 升级aws cli
  2. aws eks update-kubeconfig --region=us-east-2 --name=rdqa-k8s

kinesis firehose

kinesis收不到fluent bit的数据:

  1. 要用stern看所有pod的,当初只看了一个pod,导致错了。
  2. 给nodegroup加kinesis firehose的权限,注意,不能是kinesis,必须是firehose,不能是eks的权限,必须是nodegroup的

kinisis datastreams

  1. fluentbit 官网的文档是错的,应该用plugin:kinesis
  2. lamda mac 用 pip3 install,必须打包,因为依赖request
  3. rewrite filter reg必须是字符串,int不行!
  4. 使用stdout output进行debug
  5. https://aws.amazon.com/cn/blogs/china/build-a-logging-system-with-fluent-bit-and-amazon-opensearch-service/
  6. 需更改region

aos

  1. 碰到使用t3 small,dashboard打不开,重新用t3 medium就行了。

e2c

  • 带宽
The number reported is the number of bytes sent during the period. If you are using basic (5-minute) monitoring and the statistic is Sum, you can divide this number by 300 to find Bytes/second. If you have detailed (1-minute) monitoring and the statistic is Sum, divide it by 60.

即:sum值 / 300 * 8