Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WA and WD installation is getting failed. #577

Open
barochiarg opened this issue Nov 9, 2023 · 7 comments
Open

WA and WD installation is getting failed. #577

barochiarg opened this issue Nov 9, 2023 · 7 comments

Comments

@barochiarg
Copy link

Describe the bug
Watson Assistance installation is failed

To Reproduce
While installing the watson-assistance using cloud-pak-deployer on AWS environment, installation is failed with below message. It's also not creating namespace "openshift-storage". Same issue may present for watson-discovery too.

TASK [cp4d-cartridge-install : Set up Multicloud Object Gateway (MCG) secrets for watson_assistant in CP4D project cpd, logs are in /home/ec2-user/cpd-status/log/cpd-watson_assistant-setup-mcg.log] ***
Thursday 09 November 2023 07:34:20 +0000 (0:00:00.051) 0:26:42.451 *****
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -o pipefail\nsetup-mcg \\n --components=watson_assistant \\n --cpd_instance_ns=cpd \\n --noobaa_account_secret=noobaa-admin \\n --noobaa_cert_secret=noobaa-s3-serving-cert | tee /home/ec2-user/cpd-status/log/cpd-watson_assistant-setup-mcg.log\n", "delta": "0:00:00.147544", "end": "2023-11-09 07:34:21.267867", "msg": "non-zero return code", "rc": 1, "start": "2023-11-09 07:34:21.120323", "stderr": "Error from server (NotFound): secrets "noobaa-admin" not found", "stderr_lines": ["Error from server (NotFound): secrets "noobaa-admin" not found"], "stdout": "Running the setup for the watson_assistant component using the cpd project.", "stdout_lines": ["Running the setup for the watson_assistant component using the cpd project."]}

PLAY RECAP *********************************************************************
localhost : ok=1235 changed=148 unreachable=0 failed=1 skipped=575 rescued=0 ignored=0

Thursday 09 November 2023 07:34:21 +0000 (0:00:00.411) 0:26:42.862 *****

cp4d-scheduling-service : Run scheduler installation script, output can be found in /home/ec2-user/cpd-status/log/cpd-apply-scheduler.log - 308.72s
cp4d-cluster : Run script to setup instance topology, output can be found in /home/ec2-user/cpd-status/log/cpd-setup-instance-topology.log - 205.80s
cp4d-subscriptions : Run apply-olm command to install cartridge subscriptions, logs are in /home/ec2-user/cpd-status/log/cpd-apply-olm-cartridge-sub.log - 183.72s
cp-fs-cluster-components : Run shell script to apply cluster components, logs are in /home/ec2-user/cpd-status/log/cpd-apply-cluster-components.log - 176.57s
cp4d-catalog-source : Run apply-olm command to create catalog sources, logs are in /home/ec2-user/cpd-status/log/apply-olm-create-catsrc.log - 173.82s
cp4d-catalog-source : Generate preview script to create catalog sources, logs are in /home/ec2-user/cpd-status/log/apply-olm-create-catsrc.log - 102.04s
cp4d-subscriptions : Generate preview script to install cartridge subscriptions, logs are in /home/ec2-user/cpd-status/log/cpd-apply-olm-cartridge-sub.log -- 30.60s
cp4d-cluster : Run apply-cr command to install Cloud Pak for Data platform, logs are in /home/ec2-user/cpd-status/log/cpd-apply-cr-cpd-platform.log -- 24.82s
cp4d-cluster : Run script to authorize instance, output can be found in /home/ec2-user/cpd-status/log/cpd-authorize-instance.log -- 17.93s
cp4d-cluster : Generate preview script to install Cloud Pak for Data platform, logs are in /home/ec2-user/cpd-status/log/cpd-apply-cr-cpd-platform.log -- 15.52s
openshift-download-installer : Unpack OpenShift installer -------------- 15.39s
cpd-cli-download : Unpack cpd-cli from /home/ec2-user/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz -- 12.66s
aws-download-cli : Unpack aws-cli client installer ---------------------- 7.72s
openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 5.20s
openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 3.38s
ibm-pak-download : Extract ibm-pak from /home/ec2-user/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 3.36s
openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 3.26s
cloudctl-download : Unpack cloudctl from /home/ec2-user/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 3.03s
cp4d-cluster : Run apply-entitlement command ---------------------------- 2.62s
cp4d-variables : Add versions details from olm-utils -------------------- 2.60s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

Expected behavior
WA should install successfully.

Desktop (please complete the following information):
AWS environment - self-managed and ROSA openshift

@barochiarg
Copy link
Author

related issue: #493

@fketelaars
Copy link
Collaborator

We found that MCG does not get deployed when STS is used for authentication:

time="2023-11-20T08:11:50Z" level=info msg="✅ RPC: system.update_endpoint_group() Response OK: took 0.3ms"
time="2023-11-20T08:11:50Z" level=info msg="✈️  RPC: redirector.register_to_cluster() Request: <nil>"
time="2023-11-20T08:11:50Z" level=info msg="✅ RPC: redirector.register_to_cluster() Response OK: took 0.2ms"
time="2023-11-20T08:11:50Z" level=info msg="❌ Not Found: BackingStore \"noobaa-default-backing-store\"\n"
time="2023-11-20T08:11:50Z" level=info msg="CredentialsRequest \"noobaa-aws-cloud-creds\" created. Creating default backing store on AWS objectstore" func=ReconcileDefaultBackingStore sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=info msg="❌ Not Found:  \"noobaa-aws-cloud-creds-secret\"\n"
time="2023-11-20T08:11:50Z" level=info msg="Secret \"noobaa-aws-cloud-creds-secret\" was not created yet by cloud-credentials operator. retry on next reconcile.." sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=warning msg="⏳ Temporary Error: cloud credentials secret \"noobaa-aws-cloud-creds-secret\" is not ready yet" sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=info msg="UpdateStatus: Done generation 2" sys=openshift-storage/noobaa

@fketelaars
Copy link
Collaborator

fketelaars commented Nov 24, 2023

After some research, this turns out to be the same issue as #310. When trying to provision ODF, the default backing store is not created and the CredentialsRequest does not result in the creation of a secret for the NooBaa operator.

@fketelaars
Copy link
Collaborator

fketelaars commented Jan 17, 2024

Steps to reproduce the issue:

Manually provisioning an OpenShift cluster on AWS with temporary credentials (STS)

Set environment variables

export AWS_REGION=eu-central-1
export AWS_CFG_DIR=~/aws
export OCP_CLUSTER_NAME=aws-sts
export OCP_DOMAIN_NAME=deployer-demo.eu

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_access_key

Create directories

mkdir -pv $AWS_CFG_DIR

Download installer and client

mkdir -pv $AWS_CFG_DIR/downloads

curl -sLo $AWS_CFG_DIR/downloads/openshift-install-linux.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-4.12/openshift-install-linux.tar.gz
tar xvzf ${AWS_CONFIG}/downloads/openshift-install-linux.tar.gz -C ~/bin/

Prepare permanent credentials

In case you want to run the process multiple times, it is best to have a script to reset the AWS credentials to the permanent ones, after which you can generate new temporary credentials.

cat << EOF > $AWS_CFG_DIR/aws-reset-creds.sh
export KUBECONFIG=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/auth/kubeconfig
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN
EOF

Reset environment

rm -rf  $AWS_CFG_DIR/$OCP_CLUSTER_NAME
mkdir -pv  $AWS_CFG_DIR/$OCP_CLUSTER_NAME
source $AWS_CFG_DIR/aws-reset-creds.sh

Generate AWS STS token

printf "\nexport AWS_ACCESS_KEY_ID=%s\nexport AWS_SECRET_ACCESS_KEY=%s\nexport AWS_SESSION_TOKEN=%s\n" $(aws sts assume-role \
--role-arn arn:aws:iam::872255850422:role/fk-sts-role \
--role-session-name OCPInstall \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text) > /tmp/sts-credentials.sh

source /tmp/sts-credentials.sh

Create Cloud Credentials Operator resources

RELEASE_IMAGE=$(openshift-install version | awk '/release image/ {print $3}') && echo "Release image: ${RELEASE_IMAGE}"

CCO_IMAGE=$(oc adm release info --image-for='cloud-credential-operator' $RELEASE_IMAGE -a /tmp/ocp_pullsecret.json) && echo $CCO_IMAGE

pushd ~/bin
oc image extract $CCO_IMAGE --file="/usr/bin/ccoctl" -a /tmp/ocp_pullsecret.json
popd
chmod 775 ~/bin/ccoctl

oc adm release extract --credentials-requests --cloud=aws --to=${AWS_CFG_DIR}/credrequests --from=$RELEASE_IMAGE

ccoctl aws create-all --name=${OCP_CLUSTER_NAME} --region=${AWS_REGION} --credentials-requests-dir=${AWS_CFG_DIR}/credrequests --output-dir=${AWS_CFG_DIR}/credoutput

Prepare OpenShift installation

mkdir -p ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}

cat << EOF > ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/install-config.yaml
apiVersion: v1
baseDomain: ${OCP_DOMAIN_NAME}
credentialsMode: Manual
metadata:
  name: ${OCP_CLUSTER_NAME}

controlPlane:   
  hyperthreading: Enabled 
  name: master
  platform:
    aws:
      type: m5.xlarge
      zones:
      - ${AWS_REGION}a
  replicas: 3

compute: 
- hyperthreading: Enabled 
  name: worker
  platform:
    aws:
      type: m5.4xlarge
      zones:
      - ${AWS_REGION}a
  replicas: 3

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

platform:
  aws:
    region: ${AWS_REGION}

fips: false
pullSecret: '$(cat /tmp/ocp_pullsecret.json)'
sshKey: $(cat ~/.ssh/id_rsa.pub)
EOF

pushd ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}
openshift-install create manifests
popd

cp ${AWS_CFG_DIR}/credoutput/manifests/* ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/manifests
cp -r ${AWS_CFG_DIR}/credoutput/tls ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}

Create OpenShift cluster

openshift-install create cluster --dir=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME} --log-level=debug

Connect to OpenShift

export KUBECONFIG=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/auth/kubeconfig

Install OpenShift Storage operator

oc create ns openshift-storage

cat << EOF | oc apply -f -
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-storage
  namespace: openshift-storage
spec:
  targetNamespaces:
  - openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/ocs-operator.openshift-storage: ""
  name: odf-operator
  namespace: openshift-storage
spec:
  channel: stable-4.12
  installPlanApproval: Automatic
  name: odf-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Now wait until the OpenShift Data Foundation operator is ready.

watch "oc get csv -n openshift-storage -l operators.coreos.com/ocs-operator.openshift-storage --no-headers -o custom-columns='name:metadata.name,phase:status.phase'"

Patch OpenShift console

oc patch console.operator cluster \
    -n openshift-storage \
    --type json \
    -p '[{"op": "add", "path": "/spec/plugins", "value": ["odf-console"]}]'

Create storage cluster

cat << EOF | oc apply -f -
---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  annotations:
    uninstall.ocs.openshift.io/cleanup-policy: delete
    uninstall.ocs.openshift.io/mode: graceful
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  multiCloudGateway:
    dbStorageClassName: gp3-csi
    reconcileStrategy: standalone
EOF

Wait for the Storagecluster to reconcile. It never does because it fails to create the backingstore.

Create backingstore manually

cat << EOF | oc apply -f -
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
  name: noobaa-default-backing-store
  namespace: openshift-storage
spec:
  pvPool:
    numVolumes: 1
    resources:
      requests:
        storage: 100Gi
    secret: {}
    storageClass: gp3-csi
  type: pv-pool
EOF

Go to OpenShift console

echo "Go to console: https://$(oc get route --no-headers -n openshift-console console -o custom-columns='host:.spec.host')"
echo "Log in as kubeadmin, password $(cat ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/auth/kubeadmin-password)"

Destroy OpenShift cluster

openshift-install destroy cluster --dir=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME} --log-level=debug

@fketelaars
Copy link
Collaborator

We found a way to work around the current issue by creating the backingstore that is expected by the StorageCluster. The backingstore will be based on a PVC instead of AWS S3. This is not ideal, but will help us to progress with the provisioning of MCG.

@fketelaars
Copy link
Collaborator

This has been resolved by using OpenShift 4.14.

@fketelaars
Copy link
Collaborator

Issued reopened. The StorageCluster does get to a Ready state in OpenShift 4.14, but the BackingStorage stays in the BackingStorePhaseRejected state and no bucket is created for the cluster, meaning that any attempt to access the bucket fails.

Need to make the following changes:

  1. Create namespace with the correct label.
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: openshift-storage
  1. Update CredentialsRequest to work with ServiceAccount:
oc get credentialsrequest -n openshift-storage noobaa-aws-cloud-creds -o yaml > nooba-credreq.yaml
NOOBA_BUCKET=$(cat nooba-credreq.yaml|grep arn:aws:s3:::|head -1|awk -F: '{print $7}')
# add the following at the end
      # serviceAccountName:
      #   - noobaa
ccoctl aws create-iam-roles --name="${OCP_CLUSTER_NAME}" --region="${AWS_REGION}" --credentials-requests-dir=. --identity-provider-arn=arn:aws:iam:: 872255850422:oidc-provider/${OCP_CLUSTER_NAME}-oidc.s3.${AWS_REGION}.amazonaws.com
aws s3api create-bucket --bucket ${NOOBA_BUCKET} --region ${AWS_REGION} --create-bucket-configuration LocationConstraint=${AWS_REGION}
  1. Create BackingStore:
cat <<EOF | oc apply -f -
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
  finalizers:
  - noobaa.io/finalizer
  labels:
    app: noobaa
  name: noobaa-default-backing-store
  namespace: openshift-storage
spec:
  awsS3:
    awsSTSRoleARN: arn:aws:iam:: 872255850422:oidc-provider/${OCP_CLUSTER_NAME}-oidc.s3.${AWS_REGION}.amazonaws.com
    targetBucket: ${NOOBA_BUCKET}
    secret:
      name: noobaa-aws-cloud-creds-secret
      namespace: openshift-storage
  pvPool:
    numVolumes: 1
    resources:
      requests:
        storage: 50Gi
    secret: {}
    storageClass: gp3-csi
  type: pv-pool
EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants