Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed postgres - /var/lib/postgresql/data: permission denied #483

Open
flisak-robert opened this issue Jul 28, 2021 · 14 comments · Fixed by #485
Open

Managed postgres - /var/lib/postgresql/data: permission denied #483

flisak-robert opened this issue Jul 28, 2021 · 14 comments · Fixed by #485
Labels
component:operator type:bug Something isn't working

Comments

@flisak-robert
Copy link

I am trying to install AWX using the awx-operator running on k3s and awx-postgres pod fails with the message:
mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied
Here is my awx.yml:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  ingress_type: Ingress
  route_tls_termination_mechanism: edge
  hostname: localhost
  postgres_storage_requirements:
    requests:
      storage: 3Gi
  projects_persistence: true
  projects_existing_claim: awx-projects-claim
  web_resource_requirements:
    requests:
      cpu: 250m
      memory: 2Gi
    limits:
      cpu: 750m
      memory: 4Gi
  task_resource_requirements:
    requests:
      cpu: 250m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 2Gi
  ee_resource_requirements:
    requests:
      cpu: 250m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 2Gi

What am I doing wrong here?

@Marwel
Copy link

Marwel commented Jul 30, 2021

Have a look at this.
https://github.com/kurokobo/awx-on-k3s

@scott-vick
Copy link

scott-vick commented Jul 30, 2021

I'm am following the exact instructions @Marwel linked above but I'm getting the exact same error (mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied). I've been banging my head for 4 days w/ no success on installing AWX.

@tchellomello
Copy link
Contributor

tchellomello commented Jul 30, 2021

I'll give a try this afternoon using k3s since I cannot reproduce on my current lab

@tchellomello tchellomello self-assigned this Jul 30, 2021
@tchellomello
Copy link
Contributor

Hello guys, I've deployed a k3s with a single node on my testing machine as described at https://rancher.com/docs/k3s/latest/en/quick-start/#install-script

$ kubectl get nodes                                                            
NAME              STATUS   ROLES                  AGE     VERSION
storm.tatu.home   Ready    control-plane,master   3m39s   v1.21.3+k3s1

$ kubectl get pods -A                                                                                          23:01:09
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-5ff76fc89d-4d7bn   1/1     Running     0          9m51s
kube-system   metrics-server-86cbb8457f-9fkt2           1/1     Running     0          9m51s
kube-system   coredns-7448499f4d-9t87w                  1/1     Running     0          9m51s
kube-system   helm-install-traefik-crd-mlrtg            0/1     Completed   0          9m51s
kube-system   helm-install-traefik-v5n5s                0/1     Completed   1          9m51s
kube-system   svclb-traefik-c9cgh                       2/2     Running     0          9m28s
kube-system   traefik-97b44b794-6dz4g                   1/1     Running     0          9m28s

Then I generated the latest devel operator image and deployed:

kubectl apply -f deploy/awx-operator.yaml                                                                     23:08:40
customresourcedefinition.apiextensions.k8s.io/awxs.awx.ansible.com created
customresourcedefinition.apiextensions.k8s.io/awxbackups.awx.ansible.com created
customresourcedefinition.apiextensions.k8s.io/awxrestores.awx.ansible.com created
clusterrole.rbac.authorization.k8s.io/awx-operator created
clusterrolebinding.rbac.authorization.k8s.io/awx-operator created
serviceaccount/awx-operator created
deployment.apps/awx-operator created

The operator started as expected:

kubectl get pods -A -w                                                                                       23:07:32
NAMESPACE     NAME                                      READY   STATUS              RESTARTS   AGE
kube-system   local-path-provisioner-5ff76fc89d-4d7bn   1/1     Running             0          11m
kube-system   metrics-server-86cbb8457f-9fkt2           1/1     Running             0          11m
kube-system   coredns-7448499f4d-9t87w                  1/1     Running             0          11m
kube-system   helm-install-traefik-crd-mlrtg            0/1     Completed           0          11m
kube-system   helm-install-traefik-v5n5s                0/1     Completed           1          11m
kube-system   svclb-traefik-c9cgh                       2/2     Running             0          10m
kube-system   traefik-97b44b794-6dz4g                   1/1     Running             0          10m
default       awx-operator-88b886454-9pq7w              0/1     ContainerCreating   0          15s
default       awx-operator-88b886454-9pq7w              1/1     Running             0          16s

So now to troubleshooting, I'm using a similar AWX spec provided earlier on as follows below. As you can see, I have to extend it so I could create the PVC awx-projects-claim expected to exist according to the AWX spec.

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: awx-projects-claim
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi
---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  ingress_type: Ingress
  route_tls_termination_mechanism: edge
  hostname: localhost
  postgres_storage_requirements:
    requests:
      storage: 3Gi
  projects_persistence: true
  projects_existing_claim: awx-projects-claim
  web_resource_requirements:
    requests:
      cpu: 250m
      memory: 2Gi
    limits:
      cpu: 750m
      memory: 4Gi
  task_resource_requirements:
    requests:
      cpu: 250m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 2Gi
  ee_resource_requirements:
    requests:
      cpu: 250m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 2Gi
$ kubectl apply -f pg-k3s.yml                                                                                                   23:13:08
persistentvolumeclaim/awx-projects-claim created
awx.awx.ansible.com/awx created

# still pending because POD has not started yet
$ kubectl get pvc                                                                                                               23:14:05
NAME                      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
awx-projects-claim        Pending                                                                        local-path     23s
postgres-awx-postgres-0   Bound     pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   3Gi        RWO            local-path     4s

Then looking at the pod, I got it to crash

$ sh kubectl get pods -A -w                                                                                       23:07:32
NAMESPACE     NAME                                      READY   STATUS              RESTARTS   AGE
kube-system   local-path-provisioner-5ff76fc89d-4d7bn   1/1     Running             0          11m
kube-system   metrics-server-86cbb8457f-9fkt2           1/1     Running             0          11m
kube-system   coredns-7448499f4d-9t87w                  1/1     Running             0          11m
kube-system   helm-install-traefik-crd-mlrtg            0/1     Completed           0          11m
kube-system   helm-install-traefik-v5n5s                0/1     Completed           1          11m
kube-system   svclb-traefik-c9cgh                       2/2     Running             0          10m
kube-system   traefik-97b44b794-6dz4g                   1/1     Running             0          10m
default       awx-operator-88b886454-9pq7w              0/1     ContainerCreating   0          15s
default       awx-operator-88b886454-9pq7w              1/1     Running             0          16s



default       awx-postgres-0                            0/1     Pending             0          0s
kube-system   helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   0/1     Pending             0          0s
kube-system   helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   0/1     ContainerCreating   0          0s
kube-system   helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   0/1     Completed           0          3s
kube-system   helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   0/1     Terminating         0          3s
kube-system   helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   0/1     Terminating         0          3s
default       awx-postgres-0                                               0/1     Pending             0          4s
default       awx-postgres-0                                               0/1     ContainerCreating   0          4s
default       awx-76bdfc954c-jxvll                                         0/4     Pending             0          0s
kube-system   helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5   0/1     Pending             0          0s
kube-system   helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5   0/1     ContainerCreating   0          0s
default       awx-postgres-0                                               1/1     Running             0          15s
kube-system   helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5   0/1     Completed           0          6s
kube-system   helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5   0/1     Terminating         0          7s
kube-system   helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5   0/1     Terminating         0          7s
default       awx-postgres-0                                               0/1     Error               0          16s
default       awx-76bdfc954c-jxvll                                         0/4     Pending             0          7s
default       awx-76bdfc954c-jxvll                                         0/4     Init:0/1            0          8s
default       awx-postgres-0                                               0/1     Error               1          18s
default       awx-postgres-0                                               0/1     CrashLoopBackOff    1          18s
default       awx-76bdfc954c-jxvll                                         0/4     PodInitializing     0          18s
default       awx-postgres-0                                               1/1     Running             2          35s
default       awx-postgres-0                                               0/1     Error               2          35s
default       awx-postgres-0                                               0/1     CrashLoopBackOff    2          48s
default       awx-postgres-0                                               0/1     Error               3          64s
default       awx-postgres-0                                               0/1     CrashLoopBackOff    3          77s
default       awx-76bdfc954c-jxvll                                         4/4     Running             0          111s
default       awx-postgres-0                                               0/1     CrashLoopBackOff    4          2m11s

So basically the postgres statefulset did not work start however the awx worked fine (of course not functional due to the missing database)

$ kubectl get pvc
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
postgres-awx-postgres-0   Bound    pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c   3Gi        RWO            local-path     2m37s
awx-projects-claim        Bound    pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5   2Gi        RWO            local-path     2m56s

$ kubectl get pods                                                                                       23:17:32
NAME                           READY   STATUS    RESTARTS   AGE
awx-operator-88b886454-9pq7w   1/1     Running   0          8m47s
awx-76bdfc954c-jxvll           4/4     Running   0          3m21s
awx-postgres-0                 0/1     Error     5          3m30s

Then looking at the container, yes, I got the same error on k3s using the local-path-provisioner. It looks like similar to the #413, however, we need to address it for the postgresql statefulset.

$ kubectl logs awx-postgres-0                                                                       23:26:52
mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied

I'm working on it.

@tchellomello tchellomello added state:in_progress type:bug Something isn't working labels Jul 31, 2021
@tchellomello
Copy link
Contributor

tchellomello commented Jul 31, 2021

So basically we will need to leverage on a `initContainer approach to fix the permission so the database can be created. This snippet will do the job:

diff --git a/roles/installer/tasks/database_configuration.yml b/roles/installer/tasks/database_configuration.yml
index 2e99be5..470530a 100644
--- a/roles/installer/tasks/database_configuration.yml
+++ b/roles/installer/tasks/database_configuration.yml
@@ -80,8 +80,9 @@
 - block:
     - name: Create Database if no database is specified
       k8s:
-        apply: true
+        apply: yes
         definition: "{{ lookup('template', 'postgres.yaml.j2') }}"
+        wait: yes
       register: create_statefulset_result
 
   rescue:
diff --git a/roles/installer/templates/postgres.yaml.j2 b/roles/installer/templates/postgres.yaml.j2
index d17ee12..f87c842 100644
--- a/roles/installer/templates/postgres.yaml.j2
+++ b/roles/installer/templates/postgres.yaml.j2
@@ -37,10 +37,27 @@ spec:
       imagePullSecrets:
         - name: {{ image_pull_secret }}
 {% endif %}
+      initContainers:
+        - name: init-chmod-data
+          image: '{{ postgres_image }}:{{ postgres_image_version }}'
+          imagePullPolicy: '{{ image_pull_policy }}'
+          command:
+            - /bin/sh
+            - -c
+            - |
+              if [ ! -f {{ postgres_data_path }}/PG_VERSION ]; then
+                chown postgres:root {{ postgres_data_path | dirname }}
+              fi
+          volumeMounts:
+            - name: postgres
+              mountPath: '{{ postgres_data_path | dirname }}'
+              subPath: '{{ postgres_data_path | dirname | basename }}'
       containers:
         - image: '{{ postgres_image }}:{{ postgres_image_version }}'
           imagePullPolicy: '{{ image_pull_policy }}'
           name: postgres
+          securityContext:
+            fsGroup: 999
           env:
             # For postgres_image based on rhel8/postgresql-12
             - name: POSTGRESQL_DATABASE

It does result in a working state once the patch is applied:

$ ubectl get pods -w                              00:38:58
NAME                            READY   STATUS    RESTARTS   AGE
awx-operator-5bc776b4d4-d9ww2   1/1     Running   0          4m41s
awx-postgres-0                  1/1     Running   0          4m3s
awx-d67898cd9-k6jrc             4/4     Running   0          3m48s

$ kubectl iexec awx-postgres-0 /bin/bash                                                                                                                         00:57:00
root@awx-postgres-0:/# namei  -xmolv /var/lib/postgresql/data/pgdata/
f: /var/lib/postgresql/data/pgdata/
Drwxr-xr-x root     root     /
drwxr-xr-x root     root     var
drwxr-xr-x root     root     lib
drwxr-xr-x postgres postgres postgresql
Drwx------ postgres root     data
drwx------ postgres root     pgdata

I'll create a PR for it. Thanks for reporting the issue @flisak-robert and @scott-vick

@webees
Copy link

webees commented Aug 6, 2021

Is there any temporary solution before the update?

@flisak-robert
Copy link
Author

flisak-robert commented Aug 6, 2021

Is there any temporary solution before the update?

Don't know if it suits your needs, but I just ran postgres in a docker container and pointed awx to use that postgres instance instead.
Here is my config:

apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: <postgres address>
  port: "5432"
  database: awx
  username: postgres
  password: <postgres password>
  type: unmanaged
type: Opaque

Don't forget to include postgres_configuration_secret: awx-postgres-configuration in your awx config. If you don't, AWX won't be able to decrypt stuff in your postgres database when you restart your awx node for example. Been there, done that :(

@ghost
Copy link

ghost commented Sep 1, 2021

A workaround that I have found is to create a PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: <className>
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "<path>"

The path will be given 777 permission through chmod.
You need to add the attribute postgres_storage_class with the same value as the storageClassName.

@twobombs
Copy link

twobombs commented Sep 9, 2021

You need to add the attribute postgres_storage_class with the same value as the storageClassName.

let's say one would declare
storageClassName: pgdata
And place the declaration right underneath
postgres_storage_class: pgdata
It would be handy to do so because of maintainability of the next versions of awx-oprator
on top of that working with a PV on statefull pqdata would be a good idea anyways, right ? :)

@cnukwas
Copy link

cnukwas commented Sep 16, 2021

Created a PV with storageClassName and used that SC name as above, but getting following error in Postgres pod log.
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
Is there a way to fix this without giving 777 permissions to the pg data directory when the pod is running as a non-root user, in OpenShift clusters?

@twobombs
Copy link

twobombs commented Sep 17, 2021

Funny, I see deltas in behaviour on the 0.13 vs the 0.12; DB working, but other pods not starting.
Got this kinda working with a lot of tinkering on k3s with the help of Rancher 2.6, leveraging some volume swap magic.
Have done and will do some more work on reproducing and working around issues.

Screenshot 2021-09-17 at 14 18 22

^ this is on minikube. K3s with PV hacks same result, this is 0.13.0

@PaulVerhoeven1
Copy link
Contributor

did someone manage to solve this problem?

@twobombs
Copy link

twobombs commented May 6, 2022

I eventually deployed with the dev branch of awx operator.

@RyuTKC
Copy link

RyuTKC commented May 11, 2022

I solved this with mountOptions.
Maybe, the postgres container requires options below on /PV/PVC/StorageClass .

mountOptions
  - dir_mode=0750
  - file_mode=0750
  - uid=999
  - gid=999

First, I tried only with uid=999 and gid=999 but container failed starting and out put this logs.
(It seems that postgres container is operated as 999:999 in this case)

fixing permissions on existing directory /var/lib/postgresql/data/pgdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
2022-05-11 03:03:31.207 UTC [83] FATAL:  data directory "/var/lib/postgresql/data/pgdata" has invalid permissions
2022-05-11 03:03:31.207 UTC [83] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
child process exited with exit code 1

But I succeeded by adding dir_mode=0750 and file_mode=0750 as pointed out in logs.
Just for the record, I used csi-smb-driver for PV/PVC/StorageClass on RKE2 single node.

@tchellomello tchellomello removed their assignment Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:operator type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.