Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

腾讯云裸盘自动发现时候pod crash #76

Open
guolong123 opened this issue Jul 16, 2022 · 44 comments
Open

腾讯云裸盘自动发现时候pod crash #76

guolong123 opened this issue Jul 16, 2022 · 44 comments
Labels
bug Something isn't working

Comments

@guolong123
Copy link

容器日志
container.log

机器fdisk -l
image

csi-config
image

@antmoveh
Copy link
Contributor

这可能是个问题, 等我们在观察测试一下

@antmoveh
Copy link
Contributor

这块逻辑还在看,我们找了个腾讯云主机在测试

@guolong123
Copy link
Author

这块有进展了吗,啥时候能出新版本啊?

@antmoveh
Copy link
Contributor

antmoveh commented Jul 29, 2022

这块有进展了吗,啥时候能出新版本啊?

下周我看一下,修复一下这块逻辑。原先负责这个裸盘的开发出差了。
没有意外的话下周一应该能修复,至于发新版本还忒延后,到是会编译出新的可用镜像

@guolong123
Copy link
Author

好的,感谢。目前有啥办法能跳过这个问题吗?我现在只能用loop盘

@antmoveh
Copy link
Contributor

好的,感谢。目前有啥办法能跳过这个问题吗?我现在只能用loop盘

你是一定要用裸盘吗 用lvm的方式 应该无此问题

@guolong123
Copy link
Author

我可能不太能理解lvm的方式要怎样操作?不是自己建一些loop盘来用吗?有相关文档吗

@antmoveh
Copy link
Contributor

我可能不太能理解lvm的方式要怎样操作?不是自己建一些loop盘来用吗?有相关文档吗

https://github.com/carina-io/carina/blob/main/debug/hack/config.json 有个configmap的配置文件,policy配置为Lvm就是用lvm的方式管理磁盘。 其他的就是普通的pvc sc使用方式了

@guolong123
Copy link
Author

我发现我目前用的是lvm的形式,但看日志有报错:
image
是因为这个分区我已经挂载的原因吗?

@antmoveh
Copy link
Contributor

我发现我目前用的是lvm的形式,但看日志有报错: image 是因为这个分区我已经挂载的原因吗?

这个磁盘必须是裸盘。 上边不能用任何数据和分区 也不能被挂载

@guolong123
Copy link
Author

好吧,那我后面弄个新的机器用lvm的形式试试。

@antmoveh
Copy link
Contributor

好吧,那我后面弄个新的机器用lvm的形式试试。

多创建几个新的loop设备 也可以。

@guolong123
Copy link
Author

好的,我试过了,loop设备是ok的,目前集群中的环境就是通过这种方式来做存储的。就是机器机器重启后需要重新执行losetup -f操作。比较麻烦。

@antmoveh
Copy link
Contributor

好的,我试过了,loop设备是ok的,目前集群中的环境就是通过这种方式来做存储的。就是机器机器重启后需要重新执行losetup -f操作。比较麻烦。

腾讯云的云盘, 几块钱几十G 能用一个月

@guolong123
Copy link
Author

它要是直接有对外的sc提供就好了。我看了好像只能在它的平台建集群才能用它的sc

@antmoveh
Copy link
Contributor

它要是直接有对外的sc提供就好了。我看了好像只能在它的平台建集群才能用它的sc

不是用它的sc 是直接购买他的云盘 挂上去。挂上去后就是个普通磁盘

@guolong123
Copy link
Author

guolong123 commented Jul 29, 2022

1 node(s) didn't find available persistent volumes to bind
我新加的裸盘,然后调度到主机的时候报这个错误是为啥啊

@antmoveh
Copy link
Contributor

1 node(s) didn't find available persistent volumes to bind 我新加的裸盘,然后调度到主机的时候报这个错误是为啥啊

看看pvc状态 是不是没bound pv

@guolong123
Copy link
Author

image

@antmoveh
Copy link
Contributor

image

新加的节点上 没有carina-node服务? kubectl get pvc 看看他需要的pvc是啥状态

@guolong123
Copy link
Author

image
节点上是有carina-node服务的

@guolong123
Copy link
Author

image

看着它的日志也是正常的

@antmoveh
Copy link
Contributor

image 节点上是有carina-node服务的

kubectl describe pvc xxx
kubectl get lv
kubectl describe lv xxx
看看有啥报错信息没有

@guolong123
Copy link
Author

(venv) PS D:\01Work\01GitProject\04Xishu\phoenix\distribution\helm> kubectl describe pvc mysql-data-pvc
Name:          mysql-data-pvc
Namespace:     default
StorageClass:  csi-carina-lvm
Status:        Bound
Volume:        pvc-8ad586fa-3531-44c8-86c0-02cd796a9070
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: mysql
               meta.helm.sh/release-namespace: default
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: carina.storage.io
               volume.kubernetes.io/selected-node: vm-0-10-ubuntu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      8Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Name:          mysql-data-pvc
Namespace:     keta-presentation
StorageClass:  csi-carina-raw
Status:        Pending
Volume:
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: mysql
               meta.helm.sh/release-namespace: keta-presentation
               volume.beta.kubernetes.io/storage-provisioner: carina.storage.io
               volume.kubernetes.io/selected-node: vm-0-9-ubuntu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       mysql-c877dfb4b-bhrg6
Events:
  Type     Reason                Age                   From                                                                                           Message
  ----     ------                ----                  ----                                                                                           -------
  Normal   WaitForFirstConsumer  19m                   persistentvolume-controller                                                                    waiting for first consumer to be created before binding  
  Warning  ProvisioningFailed    13m                   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": error generating accessibility requirements: no topology key found on CSINode vm-0-9-ubuntu
  Warning  ProvisioningFailed    9m22s (x7 over 19m)   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  4m44s (x62 over 19m)  persistentvolume-controller                                                                    waiting for a volume to be created, either by external provisioner "carina.storage.io" or manually created by system administrator
  Normal   Provisioning          4m22s (x11 over 19m)  carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  External provisioner is provisioning volume for claim "keta-presentation/mysql-data-pvc"
  Warning  ProvisioningFailed    4m21s (x3 over 18m)   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": rpc error: code = Internal desc = exit status 3

@antmoveh
Copy link
Contributor

(venv) PS D:\01Work\01GitProject\04Xishu\phoenix\distribution\helm> kubectl describe pvc mysql-data-pvc
Name:          mysql-data-pvc
Namespace:     default
StorageClass:  csi-carina-lvm
Status:        Bound
Volume:        pvc-8ad586fa-3531-44c8-86c0-02cd796a9070
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: mysql
               meta.helm.sh/release-namespace: default
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: carina.storage.io
               volume.kubernetes.io/selected-node: vm-0-10-ubuntu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      8Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Name:          mysql-data-pvc
Namespace:     keta-presentation
StorageClass:  csi-carina-raw
Status:        Pending
Volume:
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: mysql
               meta.helm.sh/release-namespace: keta-presentation
               volume.beta.kubernetes.io/storage-provisioner: carina.storage.io
               volume.kubernetes.io/selected-node: vm-0-9-ubuntu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       mysql-c877dfb4b-bhrg6
Events:
  Type     Reason                Age                   From                                                                                           Message
  ----     ------                ----                  ----                                                                                           -------
  Normal   WaitForFirstConsumer  19m                   persistentvolume-controller                                                                    waiting for first consumer to be created before binding  
  Warning  ProvisioningFailed    13m                   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": error generating accessibility requirements: no topology key found on CSINode vm-0-9-ubuntu
  Warning  ProvisioningFailed    9m22s (x7 over 19m)   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  4m44s (x62 over 19m)  persistentvolume-controller                                                                    waiting for a volume to be created, either by external provisioner "carina.storage.io" or manually created by system administrator
  Normal   Provisioning          4m22s (x11 over 19m)  carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  External provisioner is provisioning volume for claim "keta-presentation/mysql-data-pvc"
  Warning  ProvisioningFailed    4m21s (x3 over 18m)   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": rpc error: code = Internal desc = exit status 3

kubectl get node vm-0-9-ubuntu -o wide

@antmoveh
Copy link
Contributor

(venv) PS D:\01Work\01GitProject\04Xishu\phoenix\distribution\helm> kubectl describe pvc mysql-data-pvc
Name:          mysql-data-pvc
Namespace:     default
StorageClass:  csi-carina-lvm
Status:        Bound
Volume:        pvc-8ad586fa-3531-44c8-86c0-02cd796a9070
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: mysql
               meta.helm.sh/release-namespace: default
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: carina.storage.io
               volume.kubernetes.io/selected-node: vm-0-10-ubuntu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      8Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Name:          mysql-data-pvc
Namespace:     keta-presentation
StorageClass:  csi-carina-raw
Status:        Pending
Volume:
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: mysql
               meta.helm.sh/release-namespace: keta-presentation
               volume.beta.kubernetes.io/storage-provisioner: carina.storage.io
               volume.kubernetes.io/selected-node: vm-0-9-ubuntu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       mysql-c877dfb4b-bhrg6
Events:
  Type     Reason                Age                   From                                                                                           Message
  ----     ------                ----                  ----                                                                                           -------
  Normal   WaitForFirstConsumer  19m                   persistentvolume-controller                                                                    waiting for first consumer to be created before binding  
  Warning  ProvisioningFailed    13m                   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": error generating accessibility requirements: no topology key found on CSINode vm-0-9-ubuntu
  Warning  ProvisioningFailed    9m22s (x7 over 19m)   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  4m44s (x62 over 19m)  persistentvolume-controller                                                                    waiting for a volume to be created, either by external provisioner "carina.storage.io" or manually created by system administrator
  Normal   Provisioning          4m22s (x11 over 19m)  carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  External provisioner is provisioning volume for claim "keta-presentation/mysql-data-pvc"
  Warning  ProvisioningFailed    4m21s (x3 over 18m)   carina.storage.io_csi-carina-controller-7d84985df8-xngtq_c1d3151f-4e17-4121-9c8b-f439a4e4f809  failed to provision volume with StorageClass "csi-carina-raw": rpc error: code = Internal desc = exit status 3

kubectl get node vm-0-9-ubuntu -o wide

看着像topology调度约束, 看看pod的topologykey是什么, 那个节点上是否有这个topologykey

@guolong123
Copy link
Author

image

@antmoveh
Copy link
Contributor

antmoveh commented Jul 29, 2022

image

kubectl get node xxx --show-label

@guolong123
Copy link
Author

image

@antmoveh
Copy link
Contributor

image

vm 15? 不是vm9吗

@antmoveh
Copy link
Contributor

image

vm 15? 不是vm9吗

kubectl get pod xxx -o yaml

@guolong123
Copy link
Author

我设置了污点,应该要调度到vm15的。我重新删除了原来的ns,现在没出现vm9了,但也不行

@guolong123
Copy link
Author

(venv) PS D:\01Work\01GitProject\04Xishu\phoenix\distribution\helm> kubectl get pod -n keta-presentation3 -o yaml     
apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    creationTimestamp: "2022-07-29T08:59:47Z"
    generateName: mysql-c877dfb4b-
    labels:
      name: mysql
      pod-template-hash: c877dfb4b
    name: mysql-c877dfb4b-nrl4n
    namespace: keta-presentation3
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: mysql-c877dfb4b
      uid: 6d107247-a18d-47e7-b40a-7950aa891963
    resourceVersion: "6353965"
    selfLink: /api/v1/namespaces/keta-presentation3/pods/mysql-c877dfb4b-nrl4n
    uid: 11757ba0-e283-4996-800e-39ff1cfc6f23
  spec:
    containers:
    - args:
      - --max-allowed-packet=1024000000
      env:
      - name: MYSQL_ROOT_PASSWORD
        value: "654321"
      image: mysql:5.7.34
      imagePullPolicy: Always
      livenessProbe:
        exec:
          command:
          - mysqladmin
          - -h127.0.0.1
          - -P3306
          - -uroot
          - -p654321
          - ping
        failureThreshold: 3
        initialDelaySeconds: 30
        periodSeconds: 5
        successThreshold: 1
        timeoutSeconds: 5
      name: mysql
      readinessProbe:
        exec:
          command:
          - mysql
          - -h127.0.0.1
          - -P3306
          - -uroot
          - -p654321
          - -e
          - SELECT 1
        failureThreshold: 3
        initialDelaySeconds: 5
        periodSeconds: 3
        successThreshold: 1
        timeoutSeconds: 3
      resources:
        limits:
          cpu: "2"
        requests:
          cpu: 600m
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/lib/mysql
        name: mysql-data
        subPath: mysql
      - mountPath: /etc/mysql/conf.d/mysql.cnf
        name: configurations
        subPath: mysql.cnf
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: default-token-dfgc9
        readOnly: true
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    priority: 0
    restartPolicy: Always
    schedulerName: carina-scheduler
    securityContext: {}
    serviceAccount: default
    serviceAccountName: default
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: prod
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - name: mysql-data
      persistentVolumeClaim:
        claimName: mysql-data-pvc
    - configMap:
        defaultMode: 420
        name: mysql-configuration
      name: configurations
    - name: default-token-dfgc9
      secret:
        defaultMode: 420
        secretName: default-token-dfgc9
  status:
    phase: Pending
    qosClass: Burstable
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

@guolong123
Copy link
Author

image
感觉问题还是在这儿

@antmoveh
Copy link
Contributor

image 感觉问题还是在这儿

这个截图 太困难了 你有什么办法 让我直接连上看看吗

@guolong123
Copy link
Author

您加我微信,我把kubeconfig给您吧?
gl565169745

@guolong123
Copy link
Author

现在居然又好了。。。。。。。

@antmoveh
Copy link
Contributor

您加我微信,我把kubeconfig给您吧? gl565169745

你在carina 交流群里吧

@guolong123
Copy link
Author

我不在,群在哪里啊,我加一下

@guolong123
Copy link
Author

guolong123 commented Jul 29, 2022

哦,我知道了,刚刚问题好了是因为我执行了:
sudo modprobe dm_thin_pool

之前使用loop盘的时候也遇到了这个问题,后面找到了这个命令,执行后就好了,感觉可能是腾讯服务器的问题。

@antmoveh
Copy link
Contributor

首页readme 下边一个大大的微信二维码

@guolong123
Copy link
Author

~~真的很大,我添加了,看起来这个不是群啊

@antmoveh
Copy link
Contributor

antmoveh commented Aug 1, 2022

该模块已经重构了,理论上该问题已经修复了。
我构建了最新的镜像:registry.cn-hangzhou.aliyuncs.com/antmoveh/carina:latest

@guolong123
Copy link
Author

好的,感谢。我加上试试。

@antmoveh antmoveh added duplicate This issue or pull request already exists bug Something isn't working and removed duplicate This issue or pull request already exists labels Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants