Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded to 2.13.1 - awx-task pod stuck "Waiting for database migrations..." #1777

Open
3 tasks done
dark-vex opened this issue Mar 15, 2024 · 4 comments
Open
3 tasks done

Comments

@dark-vex
Copy link

dark-vex commented Mar 15, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

Hello 👋 , I have upgraded AWX Operator to 2.13.1 using the helm chart but awx-task pod is stuck in "Waiting for database migrations..." phase.

AWX Operator version

2.13.1

AWX version

24.0.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

v1.27.8+k3s2

Modifications

no

Steps to reproduce

I don't have specific steps to reproduce, I have only upgraded from AWX 23.9.0 to 24.0.0 using the helm chart

Expected results

Migration job to complete successful and having AWX instance up&running

Actual results

Actual result is having the awx-task pod stuck in Init:0/3 state with the logs of init-database container looping:

[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 3284
[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 3285

Additional information

Looking at pod status "apparently" the job for migrating the DB did run successful:

 ➜ k get po -n awx
NAME                                              READY   STATUS      RESTARTS        AGE
awx-backup-28480260-t4szn                         0/1     Completed   0               19d
awx-backup-28490340-nwz6q                         0/1     Completed   0               12d
awx-backup-28500420-cxsx7                         0/1     Completed   0               5d11h
awx-migration-24.0.0-wxwm5                        0/1     Completed   0               37h
awx-operator-controller-manager-67c5f4d45-wsbhn   2/2     Running     2 (5h16m ago)   37h
awx-postgres-15-0                                 1/1     Running     0               37h
awx-task-7ff5947d5c-qkf7s                         0/4     Init:0/3    0               37h
awx-web-8577b8fc55-c4dh2                          3/3     Running     0               37h

But looking at awx-migration job logs it seems the migration got somehow finished earlier so it did not complete:

  • Logs of the migration job and describe
 ➜ k logs -n awx awx-migration-24.0.0-wxwm5
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, dab_resource_registry, main, oauth2_provider, sessions, sites, social_django, sso
Running migrations:
  Applying dab_resource_registry.0001_initial... OK
  Applying dab_resource_registry.0002_remove_resource_id... OK
  Applying dab_resource_registry.0003_alter_resource_object_id... OK
  Applying main.0190_alter_inventorysource_source_and_more... OK
 ➜

 ➜ k describe po -n awx awx-migration-24.0.0-wxwm5
Name:         awx-migration-24.0.0-wxwm5
Namespace:    awx
Priority:     0
Node:         prod-k3s-worker1/10.20.0.46
Start Time:   Wed, 13 Mar 2024 21:54:44 +0100
Labels:       batch.kubernetes.io/controller-uid=d1c661c8-9381-40be-bbb4-c21aeb0f9a93
              batch.kubernetes.io/job-name=awx-migration-24.0.0
              controller-uid=d1c661c8-9381-40be-bbb4-c21aeb0f9a93
              job-name=awx-migration-24.0.0
Annotations:  <none>
Status:       Succeeded
IP:           10.42.1.106
IPs:
  IP:           10.42.1.106
Controlled By:  Job/awx-migration-24.0.0
Containers:
  migration-job:
    Container ID:  containerd://7be9c37505f5107ea18a31a49989a412e1cb76da4fa52ce77800dab1618ce306
    Image:         quay.io/ansible/awx:24.0.0
    Image ID:      quay.io/ansible/awx@sha256:36cf9784bed082affcfd8d5b5768b313bea18639ec3d2b32470ae151ef2b3b93
    Port:          <none>
    Host Port:     <none>
    Command:
      awx-manage
      migrate
      --noinput
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 13 Mar 2024 21:54:46 +0100
      Finished:     Wed, 13 Mar 2024 21:54:58 +0100
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/tower/SECRET_KEY from custom-awx-secret-key (ro,path="SECRET_KEY")
      /etc/tower/conf.d/credentials.py from awx-application-credentials (ro,path="credentials.py")
      /etc/tower/settings.py from awx-settings (ro,path="settings.py")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9mfpl (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  awx-application-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  awx-app-credentials
    Optional:    false
  custom-awx-secret-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  custom-awx-secret-key
    Optional:    false
  awx-settings:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      awx-awx-configmap
    Optional:  false
  kube-api-access-9mfpl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

This is also confirmed by running /bin/bash -c "! awx-manage showmigrations | grep '\[ \]'" inside awx-task pod (init-database container)

 ➜ k exec -it awx-task-7ff5947d5c-qkf7s -n awx -c init-database -- bash

bash-5.1# /bin/bash -c "! awx-manage showmigrations | grep '\[ \]'"
 [ ] 0001_initial
 [ ] 0002_alter_permission_name_max_length
 [ ] 0003_alter_user_email_max_length
 [ ] 0004_alter_user_username_opts
 [ ] 0005_alter_user_last_login_null
 [ ] 0006_require_contenttypes_0002
 [ ] 0007_alter_validators_add_error_messages
 [ ] 0008_alter_user_username_max_length
 [ ] 0009_alter_user_last_name_max_length
 [ ] 0010_alter_group_name_max_length
 [ ] 0011_update_proxy_permissions
 [ ] 0012_alter_user_first_name_max_length
 [ ] 0001_initial
 [ ] 0002_v310_copy_tower_settings
 [ ] 0003_v310_JSONField_changes
 [ ] 0004_v320_reencrypt
 [ ] 0005_v330_rename_two_session_settings
 [ ] 0006_v331_ldap_group_type
 [ ] 0007_v380_rename_more_settings
 [ ] 0008_subscriptions
 [ ] 0009_rename_proot_settings
 [ ] 0010_change_to_JSONField
 [ ] 0001_initial
 [ ] 0002_remove_content_type_name
 [ ] 0001_initial
 [ ] 0002_remove_resource_id
 [ ] 0003_alter_resource_object_id
 [ ] 0001_initial
 [ ] 0002_squashed_v300_release (18 squashed migrations)
 [ ] 0003_squashed_v300_v303_updates (9 squashed migrations)
 [ ] 0004_squashed_v310_release (6 squashed migrations)
 [ ] 0005_squashed_v310_v313_updates (3 squashed migrations)
 [ ] 0006_v320_release
 [ ] 0007_v320_data_migrations
 [ ] 0008_v320_drop_v1_credential_fields
 [ ] 0009_v322_add_setting_field_for_activity_stream
 [ ] 0010_v322_add_ovirt4_tower_inventory
 [ ] 0011_v322_encrypt_survey_passwords
 [ ] 0012_v322_update_cred_types
 [ ] 0013_v330_multi_credential
 [ ] 0014_v330_saved_launchtime_configs
 [ ] 0015_v330_blank_start_args
 [ ] 0016_v330_non_blank_workflow
 [ ] 0017_v330_move_deprecated_stdout
 [ ] 0018_v330_add_additional_stdout_events
 [ ] 0019_v330_custom_virtualenv
 [ ] 0020_v330_instancegroup_policies
 [ ] 0021_v330_declare_new_rbac_roles
 [ ] 0022_v330_create_new_rbac_roles
 [ ] 0023_v330_inventory_multicred
 [ ] 0024_v330_create_user_session_membership
 [ ] 0025_v330_add_oauth_activity_stream_registrar
 [ ] 0026_v330_delete_authtoken
 [ ] 0027_v330_emitted_events
 [ ] 0028_v330_add_tower_verify
 [ ] 0030_v330_modify_application
 [ ] 0031_v330_encrypt_oauth2_secret
 [ ] 0032_v330_polymorphic_delete
 [ ] 0033_v330_oauth_help_text
 [ ] 0034_v330_delete_user_role
 [ ] 0035_v330_more_oauth2_help_text
 [ ] 0036_v330_credtype_remove_become_methods
 [ ] 0037_v330_remove_legacy_fact_cleanup
 [ ] 0038_v330_add_deleted_activitystream_actor
 [ ] 0039_v330_custom_venv_help_text
 [ ] 0040_v330_unifiedjob_controller_node
 [ ] 0041_v330_update_oauth_refreshtoken
 [ ] 0042_v330_org_member_role_deparent
 [ ] 0043_v330_oauth2accesstoken_modified
 [ ] 0044_v330_add_inventory_update_inventory
 [ ] 0045_v330_instance_managed_by_policy
 [ ] 0046_v330_remove_client_credentials_grant
 [ ] 0047_v330_activitystream_instance
 [ ] 0048_v330_django_created_modified_by_model_name
 [ ] 0049_v330_validate_instance_capacity_adjustment
 [ ] 0050_v340_drop_celery_tables
 [ ] 0051_v340_job_slicing
 [ ] 0052_v340_remove_project_scm_delete_on_next_update
 [ ] 0053_v340_workflow_inventory
 [ ] 0054_v340_workflow_convergence
 [ ] 0055_v340_add_grafana_notification
 [ ] 0056_v350_custom_venv_history
 [ ] 0057_v350_remove_become_method_type
 [ ] 0058_v350_remove_limit_limit
 [ ] 0059_v350_remove_adhoc_limit
 [ ] 0060_v350_update_schedule_uniqueness_constraint
 [ ] 0061_v350_track_native_credentialtype_source
 [ ] 0062_v350_new_playbook_stats
 [ ] 0063_v350_org_host_limits
 [ ] 0064_v350_analytics_state
 [ ] 0065_v350_index_job_status
 [ ] 0066_v350_inventorysource_custom_virtualenv
 [ ] 0067_v350_credential_plugins
 [ ] 0068_v350_index_event_created
 [ ] 0069_v350_generate_unique_install_uuid
 [ ] 0070_v350_gce_instance_id
 [ ] 0071_v350_remove_system_tracking
 [ ] 0072_v350_deprecate_fields
 [ ] 0073_v360_create_instance_group_m2m
 [ ] 0074_v360_migrate_instance_group_relations
 [ ] 0075_v360_remove_old_instance_group_relations
 [ ] 0076_v360_add_new_instance_group_relations
 [ ] 0077_v360_add_default_orderings
 [ ] 0078_v360_clear_sessions_tokens_jt
 [ ] 0079_v360_rm_implicit_oauth2_apps
 [ ] 0080_v360_replace_job_origin
 [ ] 0081_v360_notify_on_start
 [ ] 0082_v360_webhook_http_method
 [ ] 0083_v360_job_branch_override
 [ ] 0084_v360_token_description
 [ ] 0085_v360_add_notificationtemplate_messages
 [ ] 0086_v360_workflow_approval
 [ ] 0087_v360_update_credential_injector_help_text
 [ ] 0088_v360_dashboard_optimizations
 [ ] 0089_v360_new_job_event_types
 [ ] 0090_v360_WFJT_prompts
 [ ] 0091_v360_approval_node_notifications
 [ ] 0092_v360_webhook_mixin
 [ ] 0093_v360_personal_access_tokens
 [ ] 0094_v360_webhook_mixin2
 [ ] 0095_v360_increase_instance_version_length
 [ ] 0096_v360_container_groups
 [ ] 0097_v360_workflowapproval_approved_or_denied_by
 [ ] 0098_v360_rename_cyberark_aim_credential_type
 [ ] 0099_v361_license_cleanup
 [ ] 0100_v370_projectupdate_job_tags
 [ ] 0101_v370_generate_new_uuids_for_iso_nodes
 [ ] 0102_v370_unifiedjob_canceled
 [ ] 0103_v370_remove_computed_fields
 [ ] 0104_v370_cleanup_old_scan_jts
 [ ] 0105_v370_remove_jobevent_parent_and_hosts
 [ ] 0106_v370_remove_inventory_groups_with_active_failures
 [ ] 0107_v370_workflow_convergence_api_toggle
 [ ] 0108_v370_unifiedjob_dependencies_processed
 [ ] 0109_v370_job_template_organization_field
 [ ] 0110_v370_instance_ip_address
 [ ] 0111_v370_delete_channelgroup
 [ ] 0112_v370_workflow_node_identifier
 [ ] 0113_v370_event_bigint
 [ ] 0114_v370_remove_deprecated_manual_inventory_sources
 [ ] 0115_v370_schedule_set_null
 [ ] 0116_v400_remove_hipchat_notifications
 [ ] 0117_v400_remove_cloudforms_inventory
 [ ] 0118_add_remote_archive_scm_type
 [ ] 0119_inventory_plugins
 [ ] 0120_galaxy_credentials
 [ ] 0121_delete_toweranalyticsstate
 [ ] 0122_really_remove_cloudforms_inventory
 [ ] 0123_drop_hg_support
 [ ] 0124_execution_environments
 [ ] 0125_more_ee_modeling_changes
 [ ] 0126_executionenvironment_container_options
 [ ] 0127_reset_pod_spec_override
 [ ] 0128_organiaztion_read_roles_ee_admin
 [ ] 0129_unifiedjob_installed_collections
 [ ] 0130_ee_polymorphic_set_null
 [ ] 0131_undo_org_polymorphic_ee
 [ ] 0132_instancegroup_is_container_group
 [ ] 0133_centrify_vault_credtype
 [ ] 0134_unifiedjob_ansible_version
 [ ] 0135_schedule_sort_fallback_to_id
 [ ] 0136_scm_track_submodules
 [ ] 0137_custom_inventory_scripts_removal_data
 [ ] 0138_custom_inventory_scripts_removal
 [ ] 0139_isolated_removal
 [ ] 0140_rename
 [ ] 0141_remove_isolated_instances
 [ ] 0142_update_ee_image_field_description
 [ ] 0143_hostmetric
 [ ] 0144_event_partitions
 [ ] 0145_deregister_managed_ee_objs
 [ ] 0146_add_insights_inventory
 [ ] 0147_validate_ee_image_field
 [ ] 0148_unifiedjob_receptor_unit_id
 [ ] 0149_remove_inventory_insights_credential
 [ ] 0150_rename_inv_sources_inv_updates
 [ ] 0151_rename_managed_by_tower
 [ ] 0152_instance_node_type
 [ ] 0153_instance_last_seen
 [ ] 0154_set_default_uuid
 [ ] 0155_improved_health_check
 [ ] 0156_capture_mesh_topology
 [ ] 0157_inventory_labels
 [ ] 0158_make_instance_cpu_decimal
 [ ] 0159_deprecate_inventory_source_UoPU_field
 [ ] 0160_alter_schedule_rrule
 [ ] 0161_unifiedjob_host_status_counts
 [ ] 0162_alter_unifiedjob_dependent_jobs
 [ ] 0163_convert_job_tags_to_textfield
 [ ] 0164_remove_inventorysource_update_on_project_update
 [ ] 0165_task_manager_refactor
 [ ] 0166_alter_jobevent_host
 [ ] 0167_project_signature_validation_credential
 [ ] 0168_inventoryupdate_scm_revision
 [ ] 0169_jt_prompt_everything_on_launch
 [ ] 0170_node_and_link_state
 [ ] 0171_add_health_check_started
 [ ] 0172_prevent_instance_fallback
 [ ] 0173_instancegroup_max_limits
 [ ] 0174_ensure_org_ee_admin_roles
 [ ] 0175_workflowjob_is_bulk_job
 [ ] 0176_inventorysource_scm_branch
 [ ] 0177_instance_group_role_addition
 [ ] 0178_instance_group_admin_migration
 [ ] 0179_change_cyberark_plugin_names
 [ ] 0180_add_hostmetric_fields
 [ ] 0181_hostmetricsummarymonthly
 [ ] 0182_constructed_inventory
 [ ] 0183_pre_django_upgrade
 [ ] 0184_django_indexes
 [ ] 0185_move_JSONBlob_to_JSONField
 [ ] 0186_drop_django_taggit
 [ ] 0187_hop_nodes
 [ ] 0188_add_bitbucket_dc_webhook
 [ ] 0189_inbound_hop_nodes
 [ ] 0190_alter_inventorysource_source_and_more
 [ ] 0001_initial
 [ ] 0002_auto_20190406_1805
 [ ] 0003_auto_20201211_1314
 [ ] 0004_auto_20200902_2022
 [ ] 0005_auto_20211222_2352
 [ ] 0001_initial
 [ ] 0001_initial
 [ ] 0002_alter_domain_unique
 [ ] 0001_initial (2 squashed migrations)
 [ ] 0002_add_related_name (2 squashed migrations)
 [ ] 0003_alter_email_max_length (2 squashed migrations)
 [ ] 0004_auto_20160423_0400 (2 squashed migrations)
 [ ] 0005_auto_20160727_2333 (1 squashed migrations)
 [ ] 0006_partial
 [ ] 0007_code_timestamp
 [ ] 0008_partial_timestamp
 [ ] 0009_auto_20191118_0520
 [ ] 0010_uid_db_index
 [ ] 0011_alter_id_fields
 [ ] 0012_usersocialauth_extra_data_new
 [ ] 0013_migrate_extra_data
 [ ] 0014_remove_usersocialauth_extra_data
 [ ] 0015_rename_extra_data_new_usersocialauth_extra_data
 [ ] 0001_initial
 [ ] 0002_expand_provider_options
 [ ] 0003_convert_saml_string_to_list
bash-5.1#

Workaround

  1. Copy existing job
 ➜ k get job -n awx awx-migration-24.0.0 -oyaml > awx-job.yaml
  1. Remove the whole status block at the bottom of yaml file and remove resourceVersion, uid (and any other uids inside the file)

  2. Delete currently present job

 ➜ k delete job -n awx awx-migration-24.0.0
  1. Apply previously saved and modified awx-job.yaml

A new job is getting created and runs, which should complete all the migrations this time

 ➜ k logs awx-migration-24.0.0-dqjnh -n awx -f                                                                                         (glean-changes|✚1…13⚑1)
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, dab_resource_registry, main, oauth2_provider, sessions, sites, social_django, sso
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying main.0001_initial... OK
  Applying main.0002_squashed_v300_release... OK
  Applying main.0003_squashed_v300_v303_updates... OK
  Applying main.0004_squashed_v310_release... OK
  Applying conf.0001_initial... OK
  Applying conf.0002_v310_copy_tower_settings... OK
  Applying main.0005_squashed_v310_v313_updates... OK
  Applying main.0006_v320_release... OK
  Applying main.0007_v320_data_migrations... OK
  Applying main.0008_v320_drop_v1_credential_fields... OK
  Applying main.0009_v322_add_setting_field_for_activity_stream... OK
  Applying main.0010_v322_add_ovirt4_tower_inventory... OK
  Applying main.0011_v322_encrypt_survey_passwords... OK
  Applying main.0012_v322_update_cred_types... OK
  Applying main.0013_v330_multi_credential... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying conf.0003_v310_JSONField_changes... OK
  Applying conf.0004_v320_reencrypt... OK
  Applying conf.0005_v330_rename_two_session_settings... OK
  Applying conf.0006_v331_ldap_group_type... OK
  Applying conf.0007_v380_rename_more_settings... OK
  Applying conf.0008_subscriptions... OK
  Applying conf.0009_rename_proot_settings... OK
  Applying conf.0010_change_to_JSONField... OK
  Applying dab_resource_registry.0001_initial... OK
  Applying dab_resource_registry.0002_remove_resource_id... OK
  Applying dab_resource_registry.0003_alter_resource_object_id... OK
  Applying sessions.0001_initial... OK
  Applying main.0014_v330_saved_launchtime_configs... OK
  Applying main.0015_v330_blank_start_args... OK
  Applying main.0016_v330_non_blank_workflow... OK
  Applying main.0017_v330_move_deprecated_stdout... OK
  Applying main.0018_v330_add_additional_stdout_events... OK
  Applying main.0019_v330_custom_virtualenv... OK
  Applying main.0020_v330_instancegroup_policies... OK
  Applying main.0021_v330_declare_new_rbac_roles... OK
  Applying main.0022_v330_create_new_rbac_roles... OK
  Applying main.0023_v330_inventory_multicred... OK
  Applying main.0024_v330_create_user_session_membership... OK
  Applying main.0025_v330_add_oauth_activity_stream_registrar... OK
  Applying oauth2_provider.0001_initial... OK
  Applying oauth2_provider.0002_auto_20190406_1805... OK
  Applying oauth2_provider.0003_auto_20201211_1314... OK
  Applying oauth2_provider.0004_auto_20200902_2022... OK
  Applying oauth2_provider.0005_auto_20211222_2352... OK
  Applying main.0026_v330_delete_authtoken... OK
  Applying main.0027_v330_emitted_events... OK
  Applying main.0028_v330_add_tower_verify... OK
  Applying main.0030_v330_modify_application... OK
  Applying main.0031_v330_encrypt_oauth2_secret... OK
  Applying main.0032_v330_polymorphic_delete... OK
  Applying main.0033_v330_oauth_help_text... OK
2024-03-15 10:59:21,368 INFO     [-] rbac_migrations Computing role roots..
2024-03-15 10:59:21,371 INFO     [-] rbac_migrations Found 0 roots in 0.000316 seconds, rebuilding ancestry map
2024-03-15 10:59:21,372 INFO     [-] rbac_migrations Rebuild ancestors completed in 0.000009 seconds
2024-03-15 10:59:21,372 INFO     [-] rbac_migrations Done.
  Applying main.0034_v330_delete_user_role... OK
  Applying main.0035_v330_more_oauth2_help_text... OK
  Applying main.0036_v330_credtype_remove_become_methods... OK
  Applying main.0037_v330_remove_legacy_fact_cleanup... OK
  Applying main.0038_v330_add_deleted_activitystream_actor... OK
  Applying main.0039_v330_custom_venv_help_text... OK
  Applying main.0040_v330_unifiedjob_controller_node... OK
  Applying main.0041_v330_update_oauth_refreshtoken... OK
2024-03-15 10:59:24,490 INFO     [-] rbac_migrations Computing role roots..
2024-03-15 10:59:24,494 INFO     [-] rbac_migrations Found 0 roots in 0.000387 seconds, rebuilding ancestry map
2024-03-15 10:59:24,494 INFO     [-] rbac_migrations Rebuild ancestors completed in 0.000009 seconds
2024-03-15 10:59:24,495 INFO     [-] rbac_migrations Done.
  Applying main.0042_v330_org_member_role_deparent... OK
  Applying main.0043_v330_oauth2accesstoken_modified... OK
  Applying main.0044_v330_add_inventory_update_inventory... OK
  Applying main.0045_v330_instance_managed_by_policy... OK
  Applying main.0046_v330_remove_client_credentials_grant... OK
  Applying main.0047_v330_activitystream_instance... OK
  Applying main.0048_v330_django_created_modified_by_model_name... OK
  Applying main.0049_v330_validate_instance_capacity_adjustment... OK
  Applying main.0050_v340_drop_celery_tables... OK
  Applying main.0051_v340_job_slicing... OK
  Applying main.0052_v340_remove_project_scm_delete_on_next_update... OK
  Applying main.0053_v340_workflow_inventory... OK
  Applying main.0054_v340_workflow_convergence... OK
  Applying main.0055_v340_add_grafana_notification... OK
  Applying main.0056_v350_custom_venv_history... OK
  Applying main.0057_v350_remove_become_method_type... OK
  Applying main.0058_v350_remove_limit_limit... OK
  Applying main.0059_v350_remove_adhoc_limit... OK
  Applying main.0060_v350_update_schedule_uniqueness_constraint... OK
  Applying main.0061_v350_track_native_credentialtype_source... OK
  Applying main.0062_v350_new_playbook_stats... OK
  Applying main.0063_v350_org_host_limits... OK
  Applying main.0064_v350_analytics_state... OK
  Applying main.0065_v350_index_job_status... OK
  Applying main.0066_v350_inventorysource_custom_virtualenv... OK
  Applying main.0067_v350_credential_plugins... OK
  Applying main.0068_v350_index_event_created... OK
  Applying main.0069_v350_generate_unique_install_uuid... OK
  Applying main.0070_v350_gce_instance_id... OK
  Applying main.0071_v350_remove_system_tracking... OK
  Applying main.0072_v350_deprecate_fields... OK
  Applying main.0073_v360_create_instance_group_m2m... OK
  Applying main.0074_v360_migrate_instance_group_relations... OK
  Applying main.0075_v360_remove_old_instance_group_relations... OK
  Applying main.0076_v360_add_new_instance_group_relations... OK
  Applying main.0077_v360_add_default_orderings... OK
  Applying main.0078_v360_clear_sessions_tokens_jt... OK
  Applying main.0079_v360_rm_implicit_oauth2_apps... OK
  Applying main.0080_v360_replace_job_origin... OK
  Applying main.0081_v360_notify_on_start... OK
  Applying main.0082_v360_webhook_http_method... OK
  Applying main.0083_v360_job_branch_override... OK
  Applying main.0084_v360_token_description... OK
  Applying main.0085_v360_add_notificationtemplate_messages... OK
  Applying main.0086_v360_workflow_approval... OK
  Applying main.0087_v360_update_credential_injector_help_text... OK
  Applying main.0088_v360_dashboard_optimizations... OK
  Applying main.0089_v360_new_job_event_types... OK
  Applying main.0090_v360_WFJT_prompts... OK
  Applying main.0091_v360_approval_node_notifications... OK
  Applying main.0092_v360_webhook_mixin... OK
  Applying main.0093_v360_personal_access_tokens... OK
  Applying main.0094_v360_webhook_mixin2... OK
  Applying main.0095_v360_increase_instance_version_length... OK
  Applying main.0096_v360_container_groups... OK
  Applying main.0097_v360_workflowapproval_approved_or_denied_by... OK
  Applying main.0098_v360_rename_cyberark_aim_credential_type... OK
  Applying main.0099_v361_license_cleanup... OK
  Applying main.0100_v370_projectupdate_job_tags... OK
  Applying main.0101_v370_generate_new_uuids_for_iso_nodes... OK
  Applying main.0102_v370_unifiedjob_canceled... OK
  Applying main.0103_v370_remove_computed_fields... OK
  Applying main.0104_v370_cleanup_old_scan_jts... OK
  Applying main.0105_v370_remove_jobevent_parent_and_hosts... OK
  Applying main.0106_v370_remove_inventory_groups_with_active_failures... OK
  Applying main.0107_v370_workflow_convergence_api_toggle... OK
  Applying main.0108_v370_unifiedjob_dependencies_processed... OK
2024-03-15 11:00:23,314 INFO     [-] rbac_migrations Unified organization migration completed in 0.0444 seconds
2024-03-15 11:00:23,366 INFO     [-] rbac_migrations Unified organization migration completed in 0.0517 seconds
2024-03-15 11:00:25,717 INFO     [-] rbac_migrations Rebuild parentage completed in 0.007353 seconds
  Applying main.0109_v370_job_template_organization_field... OK
  Applying main.0110_v370_instance_ip_address... OK
  Applying main.0111_v370_delete_channelgroup... OK
  Applying main.0112_v370_workflow_node_identifier... OK
  Applying main.0113_v370_event_bigint... OK
  Applying main.0114_v370_remove_deprecated_manual_inventory_sources... OK
  Applying main.0115_v370_schedule_set_null... OK
  Applying main.0116_v400_remove_hipchat_notifications... OK
  Applying main.0117_v400_remove_cloudforms_inventory... OK
  Applying main.0118_add_remote_archive_scm_type... OK
  Applying main.0119_inventory_plugins... OK
  Applying main.0120_galaxy_credentials... OK
  Applying main.0121_delete_toweranalyticsstate... OK
  Applying main.0122_really_remove_cloudforms_inventory... OK
  Applying main.0123_drop_hg_support... OK
  Applying main.0124_execution_environments... OK
  Applying main.0125_more_ee_modeling_changes... OK
  Applying main.0126_executionenvironment_container_options... OK
  Applying main.0127_reset_pod_spec_override... OK
  Applying main.0128_organiaztion_read_roles_ee_admin... OK
  Applying main.0129_unifiedjob_installed_collections... OK
  Applying main.0130_ee_polymorphic_set_null... OK
  Applying main.0131_undo_org_polymorphic_ee... OK
  Applying main.0132_instancegroup_is_container_group... OK
  Applying main.0133_centrify_vault_credtype... OK
  Applying main.0134_unifiedjob_ansible_version... OK
  Applying main.0135_schedule_sort_fallback_to_id... OK
  Applying main.0136_scm_track_submodules... OK
  Applying main.0137_custom_inventory_scripts_removal_data... OK
  Applying main.0138_custom_inventory_scripts_removal... OK
  Applying main.0139_isolated_removal... OK
  Applying main.0140_rename... OK
  Applying main.0141_remove_isolated_instances... OK
  Applying main.0142_update_ee_image_field_description... OK
  Applying main.0143_hostmetric... OK
  Applying main.0144_event_partitions... OK
  Applying main.0145_deregister_managed_ee_objs... OK
  Applying main.0146_add_insights_inventory... OK
  Applying main.0147_validate_ee_image_field... OK
  Applying main.0148_unifiedjob_receptor_unit_id... OK
  Applying main.0149_remove_inventory_insights_credential... OK
  Applying main.0150_rename_inv_sources_inv_updates... OK
  Applying main.0151_rename_managed_by_tower... OK
  Applying main.0152_instance_node_type... OK
  Applying main.0153_instance_last_seen... OK
  Applying main.0154_set_default_uuid... OK
  Applying main.0155_improved_health_check... OK
  Applying main.0156_capture_mesh_topology... OK
  Applying main.0157_inventory_labels... OK
  Applying main.0158_make_instance_cpu_decimal... OK
  Applying main.0159_deprecate_inventory_source_UoPU_field... OK
  Applying main.0160_alter_schedule_rrule... OK
  Applying main.0161_unifiedjob_host_status_counts... OK
  Applying main.0162_alter_unifiedjob_dependent_jobs... OK
  Applying main.0163_convert_job_tags_to_textfield... OK
  Applying main.0164_remove_inventorysource_update_on_project_update... OK
  Applying main.0165_task_manager_refactor... OK
  Applying main.0166_alter_jobevent_host... OK
  Applying main.0167_project_signature_validation_credential... OK
  Applying main.0168_inventoryupdate_scm_revision... OK
  Applying main.0169_jt_prompt_everything_on_launch... OK
  Applying main.0170_node_and_link_state... OK
  Applying main.0171_add_health_check_started... OK
  Applying main.0172_prevent_instance_fallback... OK
  Applying main.0173_instancegroup_max_limits... OK
  Applying main.0174_ensure_org_ee_admin_roles... OK
  Applying main.0175_workflowjob_is_bulk_job... OK
  Applying main.0176_inventorysource_scm_branch... OK
  Applying main.0177_instance_group_role_addition... OK
2024-03-15 11:01:30,996 INFO     [-] awx.main.migrations Initiated migration from Org admin to use role
  Applying main.0178_instance_group_admin_migration... OK
  Applying main.0179_change_cyberark_plugin_names... OK
  Applying main.0180_add_hostmetric_fields... OK
  Applying main.0181_hostmetricsummarymonthly... OK
  Applying main.0182_constructed_inventory... OK
  Applying main.0183_pre_django_upgrade... OK
  Applying main.0184_django_indexes... OK
  Applying main.0185_move_JSONBlob_to_JSONField... OK
  Applying main.0186_drop_django_taggit... OK
  Applying main.0187_hop_nodes... OK
  Applying main.0188_add_bitbucket_dc_webhook... OK
  Applying main.0189_inbound_hop_nodes... OK
  Applying main.0190_alter_inventorysource_source_and_more... OK
  Applying sites.0001_initial... OK
  Applying sites.0002_alter_domain_unique... OK
  Applying social_django.0001_initial... OK
  Applying social_django.0002_add_related_name... OK
  Applying social_django.0003_alter_email_max_length... OK
  Applying social_django.0004_auto_20160423_0400... OK
  Applying social_django.0005_auto_20160727_2333... OK
  Applying social_django.0006_partial... OK
  Applying social_django.0007_code_timestamp... OK
  Applying social_django.0008_partial_timestamp... OK
  Applying social_django.0009_auto_20191118_0520... OK
  Applying social_django.0010_uid_db_index... OK
  Applying social_django.0011_alter_id_fields... OK
  Applying social_django.0012_usersocialauth_extra_data_new... OK
  Applying social_django.0013_migrate_extra_data... OK
  Applying social_django.0014_remove_usersocialauth_extra_data... OK
  Applying social_django.0015_rename_extra_data_new_usersocialauth_extra_data... OK
  Applying sso.0001_initial... OK
  Applying sso.0002_expand_provider_options... OK
  Applying sso.0003_convert_saml_string_to_list... OK

Operator Logs

No response

@craph
Copy link
Contributor

craph commented Mar 15, 2024

This issue looks similar to #1775 and #1770

@fosterseth
Copy link
Member

can you test my patch here?
#1770 (comment)
thanks!

@dark-vex
Copy link
Author

@fosterseth thanks, I'll try in another environment. On this one, I've ended up in reinstall+restore data from a backup

@LukWe99
Copy link

LukWe99 commented Mar 19, 2024

I encountered a similar problem: In my case the awx-task pods were stuck in init-container "init-database" with "waiting for migrations".
Therefore I checked the logs of the awx-operator and I found the following error:

TASK [Verify the resource pod name is populated.] ******************************** fatal: [localhost]: FAILED! => { "assertion": "awx_web_pod_name != ''", "changed": false, "evaluated_to": false, "msg": "Could not find the tower pod's name." }

After checking the awx-operator source code, I think, that removing the "wait" and "wait_timeout" from the task, where the web and task deployments are applied ("Apply deployment resources" in resources_configuration.yml), may cause the problem (Commit ffba1b4, Pull Request #1674).

The deployments are applied without waiting for them to be running. In the immediately following task "Get the new resource pod information after updating resource" the playbook tries to get the infos from the web pods but only with "status.phase=Running". As the previous task is not waiting for the pods created by the deployments to be running, the registered _new_pod variable may be empty at this moment. Therefore all the following set_fact tasks may use empty values and therefore the assertion task "Verify the resource pod name is populated" is failing. The playbook then ends at this point and all the following includes like "migrate_schema.yml", "initialize_django.yml" etc. are not executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants