Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'st2ctl reload' bundled in packaging fails on first st2 install #453

Open
arm4b opened this issue Mar 23, 2017 · 31 comments
Open

'st2ctl reload' bundled in packaging fails on first st2 install #453

arm4b opened this issue Mar 23, 2017 · 31 comments
Labels

Comments

@arm4b
Copy link
Member

arm4b commented Mar 23, 2017

In curl | bash script we configure mongo with specific username/password and permissions.

During the apt/yum install we automatically trigger st2ctl reload --register-runners --register-fail-on-failure via packaging postinst.

That fails, because when installing StackStorm for the first time, st2.conf is shipped with default values and so mongo user doesn't have needed permissions:

  Installing : st2-2.2.1-3.x86_64                                                                                                                                                 15/15 
Registering content...[flags = --register-runners --register-fail-on-failure]
2017-03-23 17:28:52,705 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
Traceback (most recent call last):
  File "/usr/bin/st2-register-content", line 22, in <module>
    sys.exit(content_loader.main(sys.argv[1:]))
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/content/bootstrap.py", line 384, in main
    setup(argv)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/content/bootstrap.py", line 376, in setup
    common_setup(config=config, setup_db=True, register_mq_exchanges=True)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/script_setup.py", line 86, in setup
    db_setup()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/database_setup.py", line 43, in db_setup
    ssl_match_hostname=cfg.CONF.database.ssl_match_hostname)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/persistence/db_init.py", line 65, in db_setup_with_retry
    ssl_match_hostname=ssl_match_hostname)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 94, in db_setup
    db_ensure_indexes()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 131, in db_ensure_indexes
    raise e
pymongo.errors.OperationFailure: not authorized on st2 to execute command { createIndexes: "user_d_b", indexes: [ { unique: true, background: false, sparse: false, key: { name: 1 }, name: "name_1" } ] }
@arm4b arm4b added the bug label Mar 23, 2017
@Kami
Copy link
Member

Kami commented Mar 23, 2017

In theory st2ctl reload should pass correct --config-file to the underlying st2-register-content script.

Or is it some other issue (e.g. this step is called before password is written to the config or similar)?

@arm4b
Copy link
Member Author

arm4b commented Mar 23, 2017

(e.g. this step is called before password is written to the config or similar)

Yes, because st2ctl reload is bundled in packaging postinst and executed before we can edit any values in st2.conf (see full explanation in first message).

@arm4b
Copy link
Member Author

arm4b commented Mar 23, 2017

This also may hang entire apt/yum install st2 installation in next versions (v2.3+) since we return correct exit code if register content failed https://github.com/StackStorm/st2/blob/a826898e52dcda848fcfbe5b0ae1a719d08ca288/st2common/bin/st2ctl#L226-L227

@calvincannon
Copy link

So.... I'm trying to install StackStorm for the first time and I'm new to MongoDB. How do I get past this?

[root@myhost ~]# st2ctl reload
Registering content...[flags = --config-file /etc/st2/st2.conf --register-runners --register-actions --register-aliases --register-sensors --register-triggers --register-configs --register-rules]
2017-05-04 20:57:52,137 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2017-05-04 20:57:52,247 INFO [-] =========================================================
2017-05-04 20:57:52,247 INFO [-] ############## Registering triggers #####################
2017-05-04 20:57:52,247 INFO [-] =========================================================
2017-05-04 20:57:52,295 INFO [-] Registered 0 triggers.
2017-05-04 20:57:52,295 INFO [-] =========================================================
2017-05-04 20:57:52,295 INFO [-] ############## Registering sensors ######################
2017-05-04 20:57:52,295 INFO [-] =========================================================

(Just hangs here)

@LindsayHill
Copy link
Contributor

@calvincannon How did you install StackStorm? Using the one-line install, or a manual install? What OS? Have you made any changes to the system since installing, or is this just at the end of the install?

Is the system running OK? Available memory, etc? mongodb & rabbitmq-server running OK? Anything in any of the logs at /var/log/st2/?

@calvincannon
Copy link

calvincannon commented May 5, 2017

Hi @LindsayHill,

Thanks for your quick reply. See output below. I'm basically following the RHEL 6 procedure provided here:
https://docs.stackstorm.com/install/rhel6.html

RAM, CPU & Storage were just fine.

I've had to modify the above procedure to account for the fact that we're already logged in as root and our box has no direct access to the Internet. To account for that, I've pulled down the required packages and build some internal yum repos. The packages installed flawlessly and all services started just fine. I didn't see anything of concern up to where I started below and I don't really know what to look for in the logs. I didn't see anything obvious in the logs though. I suspect this error may be the key to things: (It's also noted, in context, below)

[root@myhost ~]# /opt/stackstorm/mistral/bin/mistral-db-manage --config-file /etc/mistral/mistral.conf populate
No handlers could be found for logger "mistral.actions.openstack.action_generator.base"

We're planning to restore this VM to a clean state and try the install again. Any help you can provide will be greatly appreciated.

Thanks!

  • CLC

[root@myhost ~]# cat /etc/*release*
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
cat: /etc/lsb-release.d: Is a directory
Red Hat Enterprise Linux Server release 6.9 (Santiago)
Red Hat Enterprise Linux Server release 6.9 (Santiago)
cpe:/o:redhat:enterprise_linux:6server:ga:server
[root@myhost ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/sysVG-rootLV
                      1.9G  536M  1.3G  30% /
tmpfs                 1.9G  4.0K  1.9G   1% /dev/shm
/dev/sda1             240M   88M  140M  39% /boot
/dev/mapper/sysVG-homeLV
                      1.9G  3.0M  1.8G   1% /home
/dev/mapper/sysVG-optLV
                      1.9G  835M 1003M  46% /opt
/dev/mapper/sysVG-tmpLV
                      969M   27M  892M   3% /tmp
/dev/mapper/sysVG-usrLV
                      3.8G  1.9G  1.8G  53% /usr
/dev/mapper/sysVG-usrlocalLV
                      969M  1.4M  917M   1% /usr/local
/dev/mapper/sysVG-varLV
                      1.9G  1.2G  676M  64% /var
/dev/mapper/sysVG-fixLV
                      9.8G  379M  8.9G   5% /fix
/dev/mapper/sysVG-optteamquestLV
                      9.7G  748M  8.5G   8% /opt/teamquest
/dev/mapper/sysVG-optctma610LV
                      2.0G  3.0M  1.9G   1% /opt/ctma610
[root@myhost ~]# free -
             total       used       free     shared    buffers     cached
Mem:       4052560    3603152     449408      16436     138296    1946532
-/+ buffers/cache:    1518324    2534236
Swap:      4095996          0    4095996



[root@myhost ~]# # Create Mistral DB in PostgreSQL
[root@myhost ~]# cat << EHD | sudo -u postgres psql
> CREATE ROLE mistral WITH CREATEDB LOGIN ENCRYPTED PASSWORD 'StackStorm';
> CREATE DATABASE mistral OWNER mistral;
> EHD
could not change directory to "/root": Permission denied
CREATE ROLE
CREATE DATABASE

[root@myhost ~]# su - postgres

-bash-4.1$ psql
psql (9.4.11)
Type "help" for help.

postgres=# CREATE ROLE mistral WITH CREATEDB LOGIN ENCRYPTED PASSWORD 'StackStorm';
ERROR:  role "mistral" already exists
postgres=# CREATE DATABASE mistral OWNER mistral;
ERROR:  database "mistral" already exists
postgres-# \q
-bash-4.1$ exit
logout
[root@myhost ~]# /opt/stackstorm/mistral/bin/mistral-db-manage --config-file /etc/mistral/mistral.conf upgrade head
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 001, Kilo release
INFO  [alembic.runtime.migration] Running upgrade 001 -> 002, Kilo
INFO  [alembic.runtime.migration] Running upgrade 002 -> 003, cron_trigger_constraints
INFO  [alembic.runtime.migration] Running upgrade 003 -> 004, add description for execution
INFO  [alembic.runtime.migration] Running upgrade 004 -> 005, Increase executions_v2 column size from JsonDictType to JsonLongDictType
INFO  [alembic.runtime.migration] Running upgrade 005 -> 006, add a Boolean column 'processed' to the table  delayed_calls_v2
INFO  [alembic.runtime.migration] Running upgrade 006 -> 007, Move system flag to base definition
INFO  [alembic.runtime.migration] Running upgrade 007 -> 008, Increase size of state_info column from String to Text
INFO  [alembic.runtime.migration] Running upgrade 008 -> 009, Add database indices
INFO  [alembic.runtime.migration] Running upgrade 009 -> 010, add_resource_members_v2_table
INFO  [alembic.runtime.migration] Running upgrade 010 -> 011, add workflow id for execution
INFO  [alembic.runtime.migration] Running upgrade 011 -> 012, add event triggers table
INFO  [alembic.runtime.migration] Running upgrade 012 -> 013, split_execution_table_increase_names
INFO  [alembic.runtime.migration] Running upgrade 013 -> 014, fix_past_scripts_discrepancies
INFO  [alembic.runtime.migration] Running upgrade 014 -> 015, add_unique_keys_for_non_locking_model
INFO  [alembic.runtime.migration] Running upgrade 015 -> 016, Increase size of task_executions_v2.unique_key
INFO  [alembic.runtime.migration] Running upgrade 016 -> 017, Add named lock table
INFO  [alembic.runtime.migration] Running upgrade 017 -> 018, increate_task_execution_unique_key_size
INFO  [alembic.runtime.migration] Running upgrade 018 -> 019, Change scheduler schema.
INFO  [alembic.runtime.migration] Running upgrade 019 -> 020, add type to task execution

[root@myhost ~]# /opt/stackstorm/mistral/bin/mistral-db-manage --config-file /etc/mistral/mistral.conf populate
No handlers could be found for logger "mistral.actions.openstack.action_generator.base"

[root@myhost ~]# sudo useradd stanley
useradd: user 'stanley' already exists
[root@myhost ~]# sudo mkdir -p /home/stanley/.ssh
[root@myhost ~]# sudo chmod 0700 /home/stanley/.ssh
[root@myhost ~]# # Generate ssh keys
[root@myhost ~]# sudo ssh-keygen -f /home/stanley/.ssh/stanley_rsa -P ""
Generating public/private rsa key pair.
Your identification has been saved in /home/stanley/.ssh/stanley_rsa.
Your public key has been saved in /home/stanley/.ssh/stanley_rsa.pub.
The key fingerprint is:
4f:26:51:ec:f7:3f:dd:83:99:cc:a3:b7:19:9f:da:8e root@wall-e-st2r6-01
The key's randomart image is:
+--[ RSA 2048]----+
|         ..      |
|         ..      |
|        ..       |
|         .. .    |
|        S o. .   |
|         =    .  |
|          . o.+.o|
|             O*o=|
|           .oE==o|
+-----------------+
[root@myhost ~]# # Authorize key-based access
[root@myhost ~]# sudo sh -c 'cat /home/stanley/.ssh/stanley_rsa.pub >> /home/stanley/.ssh/authorized_keys'
[root@myhost ~]# sudo chown -R stanley:stanley /home/stanley/.ssh
[root@myhost ~]# ll /home/stanley/.ssh
total 12
-rw------- 1 stanley stanley  402 May  4 20:55 authorized_keys
-rw------- 1 stanley stanley 1675 May  4 20:55 stanley_rsa
-rw------- 1 stanley stanley  402 May  4 20:55 stanley_rsa.pub
[root@myhost ~]# sh -c 'echo "stanley    ALL=(ALL)       NOPASSWD: SETENV: ALL" >> /etc/sudoers.d/st2'
[root@myhost ~]# chmod 0440 /etc/sudoers.d/st2
[root@myhost  ~]# sed -i -r "s/^Defaults\s+\+?requiretty/# Defaults +requiretty/g" /etc/sudoers
[root@myhost  ~]# 

[root@myhost ~]# st2ctl start
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2api:                                           [  OK  ]
Starting st2stream:                                        [  OK  ]
Starting st2auth:                                          [  OK  ]
Starting st2garbagecollector:                              [  OK  ]
Starting st2notifier:                                      [  OK  ]
Starting st2resultstracker:                                [  OK  ]
Starting st2rulesengine:                                   [  OK  ]
Starting st2sensorcontainer:                               [  OK  ]
st2chatops: unrecognized service
Starting mistral-api:                                      [  OK  ]
Starting mistral-server:                                   [  OK  ]
##### st2 components status #####
st2actionrunner PID: 23206
st2actionrunner PID: 23223
st2actionrunner PID: 23244
st2actionrunner PID: 23265
st2actionrunner PID: 23287
st2actionrunner PID: 23309
st2actionrunner PID: 23330
st2actionrunner PID: 23353
st2actionrunner PID: 23372
st2actionrunner PID: 23391
st2api PID: 23430
st2api PID: 23532
st2stream PID: 23566
st2stream PID: 23589
st2auth PID: 23619
st2auth PID: 23634
st2garbagecollector PID: 23665
st2notifier PID: 23705
st2resultstracker PID: 23737
st2rulesengine PID: 23772
st2sensorcontainer PID: 23804
st2chatops is not running.
mistral-server PID: 23914
mistral-api PID: 23856
mistral-api PID: 23883
mistral-api PID: 23885

[root@myhost ~]# st2ctl reload
Registering content...[flags = --config-file /etc/st2/st2.conf --register-runners --register-actions --register-aliases --register-sensors --register-triggers --register-configs --register-rules]
2017-05-04 20:57:52,137 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2017-05-04 20:57:52,247 INFO [-] =========================================================
2017-05-04 20:57:52,247 INFO [-] ############## Registering triggers #####################
2017-05-04 20:57:52,247 INFO [-] =========================================================
2017-05-04 20:57:52,295 INFO [-] Registered 0 triggers.
2017-05-04 20:57:52,295 INFO [-] =========================================================
2017-05-04 20:57:52,295 INFO [-] ############## Registering sensors ######################
2017-05-04 20:57:52,295 INFO [-] =========================================================

<<<<<<<<<<< This is where everything hangs... >>>>>>>>>>

@LindsayHill
Copy link
Contributor

Thanks for the additional info. Looks like you're generally following the right steps for an offline installation.

Normally you'd see this sort of output:

2017-05-05 10:14:24,083 INFO [-] =========================================================
2017-05-05 10:14:24,083 INFO [-] ############## Registering sensors ######################
2017-05-05 10:14:24,083 INFO [-] =========================================================
2017-05-05 10:14:24,212 INFO [-] Registered 6 sensors.
2017-05-05 10:14:24,212 INFO [-] =================================

So it's going wrong somewhere around sensor registration. That should not be in anyway related to that other error you've got around mistral (that could be a red herring right now).

Can you run st2ctl reload --verbose ? That should give us lots more output about what's going on, and hopefully pin it down.

@calvincannon
Copy link

calvincannon commented May 5, 2017

Sure. It appears to be hanging right at this step:


[root@myhost st2]# st2ctl reload --verbose
Registering content...[flags = --config-file /etc/st2/st2.conf --register-runners --register-actions --register-aliases --register-sensors --register-triggers --register-configs --register-rules --verbose]
2017-05-05 12:18:56,787 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2017-05-05 12:18:56,791 DEBUG [-] Ensuring database indexes...
2017-05-05 12:18:56,808 DEBUG [-] Ensuring indexes for model "UserDB"...
2017-05-05 12:18:56,812 DEBUG [-] Removing extra indexes for model "UserDB"...
2017-05-05 12:18:56,814 DEBUG [-] Removed "0" extra indexes for model "UserDB"
2017-05-05 12:18:56,814 DEBUG [-] Ensuring indexes for model "TokenDB"...
2017-05-05 12:18:56,816 DEBUG [-] Removing extra indexes for model "TokenDB"...
2017-05-05 12:18:56,818 DEBUG [-] Removed "0" extra indexes for model "TokenDB"
2017-05-05 12:18:56,818 DEBUG [-] Ensuring indexes for model "ApiKeyDB"...
2017-05-05 12:18:56,821 DEBUG [-] Removing extra indexes for model "ApiKeyDB"...
2017-05-05 12:18:56,822 DEBUG [-] Removed "0" extra indexes for model "ApiKeyDB"
2017-05-05 12:18:56,823 DEBUG [-] Ensuring indexes for model "ActionDB"...
2017-05-05 12:18:56,829 DEBUG [-] Removing extra indexes for model "ActionDB"...
2017-05-05 12:18:56,831 DEBUG [-] Removed "0" extra indexes for model "ActionDB"
2017-05-05 12:18:56,832 DEBUG [-] Ensuring indexes for model "ActionExecutionDB"...
2017-05-05 12:18:56,848 DEBUG [-] Removing extra indexes for model "ActionExecutionDB"...
2017-05-05 12:18:56,851 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionDB"
2017-05-05 12:18:56,851 DEBUG [-] Ensuring indexes for model "ActionExecutionStateDB"...
2017-05-05 12:18:56,854 DEBUG [-] Removing extra indexes for model "ActionExecutionStateDB"...
2017-05-05 12:18:56,856 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionStateDB"
2017-05-05 12:18:56,856 DEBUG [-] Ensuring indexes for model "ActionAliasDB"...
2017-05-05 12:18:56,859 DEBUG [-] Removing extra indexes for model "ActionAliasDB"...
2017-05-05 12:18:56,860 DEBUG [-] Removed "0" extra indexes for model "ActionAliasDB"
2017-05-05 12:18:56,860 DEBUG [-] Ensuring indexes for model "LiveActionDB"...
2017-05-05 12:18:56,867 DEBUG [-] Removing extra indexes for model "LiveActionDB"...
2017-05-05 12:18:56,871 DEBUG [-] Removed "0" extra indexes for model "LiveActionDB"
2017-05-05 12:18:56,871 DEBUG [-] Ensuring indexes for model "RunnerTypeDB"...
2017-05-05 12:18:56,875 DEBUG [-] Removing extra indexes for model "RunnerTypeDB"...
2017-05-05 12:18:56,876 DEBUG [-] Removed "0" extra indexes for model "RunnerTypeDB"
2017-05-05 12:18:56,877 DEBUG [-] Ensuring indexes for model "ActionAliasDB"...
2017-05-05 12:18:56,879 DEBUG [-] Removing extra indexes for model "ActionAliasDB"...
2017-05-05 12:18:56,881 DEBUG [-] Removed "0" extra indexes for model "ActionAliasDB"
2017-05-05 12:18:56,881 DEBUG [-] Ensuring indexes for model "KeyValuePairDB"...
2017-05-05 12:18:56,887 DEBUG [-] Removing extra indexes for model "KeyValuePairDB"...
2017-05-05 12:18:56,889 DEBUG [-] Removed "0" extra indexes for model "KeyValuePairDB"
2017-05-05 12:18:56,889 DEBUG [-] Ensuring indexes for model "ActionExecutionDB"...
2017-05-05 12:18:56,899 DEBUG [-] Removing extra indexes for model "ActionExecutionDB"...
2017-05-05 12:18:56,902 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionDB"
2017-05-05 12:18:56,902 DEBUG [-] Ensuring indexes for model "ActionExecutionStateDB"...
2017-05-05 12:18:56,904 DEBUG [-] Removing extra indexes for model "ActionExecutionStateDB"...
2017-05-05 12:18:56,906 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionStateDB"
2017-05-05 12:18:56,906 DEBUG [-] Ensuring indexes for model "LiveActionDB"...
2017-05-05 12:18:56,911 DEBUG [-] Removing extra indexes for model "LiveActionDB"...
2017-05-05 12:18:56,913 DEBUG [-] Removed "0" extra indexes for model "LiveActionDB"
2017-05-05 12:18:56,913 DEBUG [-] Ensuring indexes for model "PackDB"...
2017-05-05 12:18:56,918 DEBUG [-] Removing extra indexes for model "PackDB"...
2017-05-05 12:18:56,920 DEBUG [-] Removed "0" extra indexes for model "PackDB"
2017-05-05 12:18:56,920 DEBUG [-] Ensuring indexes for model "ConfigSchemaDB"...
2017-05-05 12:18:56,922 DEBUG [-] Removing extra indexes for model "ConfigSchemaDB"...
2017-05-05 12:18:56,924 DEBUG [-] Removed "0" extra indexes for model "ConfigSchemaDB"
2017-05-05 12:18:56,924 DEBUG [-] Ensuring indexes for model "ConfigDB"...
2017-05-05 12:18:56,926 DEBUG [-] Removing extra indexes for model "ConfigDB"...
2017-05-05 12:18:56,927 DEBUG [-] Removed "0" extra indexes for model "ConfigDB"
2017-05-05 12:18:56,927 DEBUG [-] Ensuring indexes for model "PolicyTypeDB"...
2017-05-05 12:18:56,931 DEBUG [-] Removing extra indexes for model "PolicyTypeDB"...
2017-05-05 12:18:56,932 DEBUG [-] Removed "0" extra indexes for model "PolicyTypeDB"
2017-05-05 12:18:56,932 DEBUG [-] Ensuring indexes for model "PolicyDB"...
2017-05-05 12:18:56,936 DEBUG [-] Removing extra indexes for model "PolicyDB"...
2017-05-05 12:18:56,937 DEBUG [-] Removed "0" extra indexes for model "PolicyDB"
2017-05-05 12:18:56,937 DEBUG [-] Ensuring indexes for model "RoleDB"...
2017-05-05 12:18:56,940 DEBUG [-] Removing extra indexes for model "RoleDB"...
2017-05-05 12:18:56,942 DEBUG [-] Removed "0" extra indexes for model "RoleDB"
2017-05-05 12:18:56,942 DEBUG [-] Ensuring indexes for model "UserRoleAssignmentDB"...
2017-05-05 12:18:56,945 DEBUG [-] Removing extra indexes for model "UserRoleAssignmentDB"...
2017-05-05 12:18:56,947 DEBUG [-] Removed "0" extra indexes for model "UserRoleAssignmentDB"
2017-05-05 12:18:56,947 DEBUG [-] Ensuring indexes for model "PermissionGrantDB"...
2017-05-05 12:18:56,948 DEBUG [-] Skipping index cleanup for blacklisted model "PermissionGrantDB"...
2017-05-05 12:18:56,948 DEBUG [-] Ensuring indexes for model "RuleDB"...
2017-05-05 12:18:56,959 DEBUG [-] Removing extra indexes for model "RuleDB"...
2017-05-05 12:18:56,962 DEBUG [-] Removed "0" extra indexes for model "RuleDB"
2017-05-05 12:18:56,962 DEBUG [-] Ensuring indexes for model "RuleEnforcementDB"...
2017-05-05 12:18:56,964 DEBUG [-] Removing extra indexes for model "RuleEnforcementDB"...
2017-05-05 12:18:56,965 DEBUG [-] Removed "0" extra indexes for model "RuleEnforcementDB"
2017-05-05 12:18:56,965 DEBUG [-] Ensuring indexes for model "RunnerTypeDB"...
2017-05-05 12:18:56,967 DEBUG [-] Removing extra indexes for model "RunnerTypeDB"...
2017-05-05 12:18:56,969 DEBUG [-] Removed "0" extra indexes for model "RunnerTypeDB"
2017-05-05 12:18:56,969 DEBUG [-] Ensuring indexes for model "SensorTypeDB"...
2017-05-05 12:18:56,972 DEBUG [-] Removing extra indexes for model "SensorTypeDB"...
2017-05-05 12:18:56,973 DEBUG [-] Removed "0" extra indexes for model "SensorTypeDB"
2017-05-05 12:18:56,974 DEBUG [-] Ensuring indexes for model "TraceDB"...
2017-05-05 12:18:56,981 DEBUG [-] Removing extra indexes for model "TraceDB"...
2017-05-05 12:18:56,983 DEBUG [-] Removed "0" extra indexes for model "TraceDB"
2017-05-05 12:18:56,983 DEBUG [-] Ensuring indexes for model "TriggerTypeDB"...
2017-05-05 12:18:56,990 DEBUG [-] Removing extra indexes for model "TriggerTypeDB"...
2017-05-05 12:18:56,991 DEBUG [-] Removed "0" extra indexes for model "TriggerTypeDB"
2017-05-05 12:18:56,992 DEBUG [-] Ensuring indexes for model "TriggerDB"...
2017-05-05 12:18:56,994 DEBUG [-] Removing extra indexes for model "TriggerDB"...
2017-05-05 12:18:56,995 DEBUG [-] Removed "0" extra indexes for model "TriggerDB"
2017-05-05 12:18:56,995 DEBUG [-] Ensuring indexes for model "TriggerInstanceDB"...
2017-05-05 12:18:57,003 DEBUG [-] Removing extra indexes for model "TriggerInstanceDB"...
2017-05-05 12:18:57,005 DEBUG [-] Removed "0" extra indexes for model "TriggerInstanceDB"
2017-05-05 12:18:57,006 DEBUG [-] Registering exchanges...

<<<<<< Hanging.... >>>>>>>>

@LindsayHill
Copy link
Contributor

From a quick skim through the code, that seems to be related to connecting to RabbitMQ.

https://github.com/StackStorm/st2/blob/master/st2common/st2common/transport/bootstrap_utils.py#L63

Is RabbitMQ happy, running, healthy? I think the logs are under /var/log/rabbitmq. Can maybe also check service rabbitmq-server status

@calvincannon
Copy link

calvincannon commented May 5, 2017

Looks good to me, but I'm a RabbitMQ newbie. Here ya go:


[root@myhost ~]# service rabbitmq-server status
Status of node 'rabbit@wall-e-st2r6-01' ...
[{pid,22152},
 {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
                        {mnesia,"MNESIA  CXC 138 12","4.5"},
                        {os_mon,"CPO  CXC 138 46","2.2.7"},
                        {xmerl,"XML parser","1.2.10"},
                        {sasl,"SASL  CXC 138 11","2.1.10"},
                        {stdlib,"ERTS  CXC 138 10","1.17.5"},
                        {kernel,"ERTS  CXC 138 10","2.14.5"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,63960344},
          {connection_procs,31980504},
          {queue_procs,241872},
          {plugins,0},
          {other_proc,9155488},
          {mnesia,119128},
          {mgmt_db,0},
          {msg_index,43920},
          {other_ets,1700168},
          {binary,1745152},
          {code,14419209},
          {atom,1354465},
          {other_system,3200438}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,1659928576},
 {disk_free_limit,1000000000},
 {disk_free,707936256},
 {file_descriptors,[{total_limit,924},
                    {total_used,831},
                    {sockets_limit,829},
                    {sockets_used,829}]},
 {processes,[{limit,1048576},{used,7603}]},
 {run_queue,0},
 {uptime,58960}]
...done.
[root@myhost ~]# cd /var/log/rabbitmq
[root@myhost rabbitmq]# ll
total 292
-rw-r--r-- 1 rabbitmq rabbitmq 289504 May  5 03:52 rabbit@myhost.log
-rw-r--r-- 1 rabbitmq rabbitmq      0 May  4 20:31 rabbit@myhost-sasl.log
-rw-r--r-- 1 root     root          0 May  4 20:31 startup_err
-rw-r--r-- 1 root     root        355 May  4 20:31 startup_log
[root@myhost rabbitmq]# cat startup_log

              RabbitMQ 3.1.5. Copyright (C) 2007-2013 GoPivotal, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /var/log/rabbitmq/rabbit@myhost.log
  ######  ##        /var/log/rabbitmq/rabbit@myhost-sasl.log
  ##########
              Starting broker... completed with 0 plugins.
[root@myhost rabbitmq]# 

@LindsayHill
Copy link
Contributor

Anything in that rabbit@myhost.log file?

Could also restart rabbitmq.

Is name resolution OK for the local hostname? I have seen some issues in the past related to that with RabbitMQ. It doesn't like it when hostnames get changed, and/or the local system can't resolve its own hostname

@calvincannon
Copy link

calvincannon commented May 5, 2017

Hmmmmm..... I should have caught this earlier. Here's the end of that log file. I'll wait for your reply before restarting the service.


=INFO REPORT==== 5-May-2017::03:51:13 ===
closing AMQP connection <0.17894.0> (127.0.0.1:57792 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::03:51:13 ===
accepting AMQP connection <0.17903.0> (127.0.0.1:57794 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::03:51:44 ===
accepting AMQP connection <0.17915.0> (127.0.0.1:57800 -> 127.0.0.1:5672)

=WARNING REPORT==== 5-May-2017::03:51:44 ===
file descriptor limit alarm set.

********************************************************************
*** New connections will not be accepted until this alarm clears ***
********************************************************************

=INFO REPORT==== 5-May-2017::03:51:44 ===
closing AMQP connection <0.17915.0> (127.0.0.1:57800 -> 127.0.0.1:5672)

=WARNING REPORT==== 5-May-2017::03:51:44 ===
file descriptor limit alarm cleared

=INFO REPORT==== 5-May-2017::03:51:44 ===
accepting AMQP connection <0.17925.0> (127.0.0.1:57802 -> 127.0.0.1:5672)

=WARNING REPORT==== 5-May-2017::03:51:44 ===
file descriptor limit alarm set.

********************************************************************
*** New connections will not be accepted until this alarm clears ***
********************************************************************

=INFO REPORT==== 5-May-2017::03:52:15 ===
accepting AMQP connection <0.17938.0> (127.0.0.1:57808 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::03:52:15 ===
closing AMQP connection <0.17938.0> (127.0.0.1:57808 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::03:52:15 ===
accepting AMQP connection <0.17947.0> (127.0.0.1:57810 -> 127.0.0.1:5672)
(END) 

@LindsayHill
Copy link
Contributor

Something screwy is going on with your rabbitmq setup.

Look at this section in your rabbitmqctl status output:

 {file_descriptors,[{total_limit,924},
                    {total_used,831},
                    {sockets_limit,829},
                    {sockets_used,829}]},

On my local CentOS 6 system, I see this:

 {file_descriptors,[{total_limit,924},
                    {total_used,49},
                    {sockets_limit,829},
                    {sockets_used,41}]},

Restarting rabbitmq won't make things any worse than they are right now. But something is very odd for your system to be using so many sockets and FDs. I would be inclined to stop st2, restart rabbitmq, and check those numbers with rabbitmqctl status. If the numbers look OK, start st2, and see what happens.

If you hunt around with Google you can fairly quickly find docs for how to increase the file & socket descriptor limits. You could increase the limits so that st2 can start, but it looks like you've got something else going on with that system for it to be using so many.

PS it might be faster to join our Slack community (https://stackstorm.com/community-signup), for more real-time discussions of this. Other users & devs might be able to offer some solutions.

@calvincannon
Copy link

calvincannon commented May 5, 2017

Cool. I'll get signed up shortly. I think I was able to follow your direction above and the numbers to appear quite a bit closer to what you were showing.


[root@myhost rabbitmq]# st2ctl stop
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2actionrunner: [  OK  ]
Stopping st2stream:                                        [  OK  ]
Stopping st2auth:                                          [  OK  ]
Stopping st2garbagecollector:                              [  OK  ]
Stopping st2notifier:                                      [  OK  ]
Stopping st2resultstracker:                                [  OK  ]
Stopping st2rulesengine:                                   [  OK  ]
Stopping st2sensorcontainer:                               [  OK  ]
st2chatops: unrecognized service
Stopping mistral-api:                                      [  OK  ]
Stopping mistral-server:                                   [  OK  ]
[root@myhost rabbitmq]# service rabbitmq restart
rabbitmq: unrecognized service
[root@myhost rabbitmq]# service rabbitmq-server restart
Restarting rabbitmq-server: SUCCESS
rabbitmq-server.
[root@myhost rabbitmq]# service rabbitmq-server status
Status of node 'rabbit@myhost' ...
[{pid,24007},
 {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
                        {os_mon,"CPO  CXC 138 46","2.2.7"},
                        {xmerl,"XML parser","1.2.10"},
                        {mnesia,"MNESIA  CXC 138 12","4.5"},
                        {sasl,"SASL  CXC 138 11","2.1.10"},
                        {stdlib,"ERTS  CXC 138 10","1.17.5"},
                        {kernel,"ERTS  CXC 138 10","2.14.5"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,28497880},
          {connection_procs,70264},
          {queue_procs,93104},
          {plugins,0},
          {other_proc,9201976},
          {mnesia,94552},
          {mgmt_db,0},
          {msg_index,27792},
          {other_ets,797728},
          {binary,665848},
          {code,14419209},
          {atom,1354465},
          {other_system,1772942}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,1659928576},
 {disk_free_limit,1000000000},
 {disk_free,707403776},
 {file_descriptors,[{total_limit,924},
                    {total_used,11},
                    {sockets_limit,829},
                    {sockets_used,2}]},
 {processes,[{limit,1048576},{used,140}]},
 {run_queue,0},
 {uptime,9}]
...done.
[root@myhost rabbitmq]# rabbitmqctl status
Status of node 'rabbit@myhost' ...
[{pid,24007},
 {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
                        {os_mon,"CPO  CXC 138 46","2.2.7"},
                        {xmerl,"XML parser","1.2.10"},
                        {mnesia,"MNESIA  CXC 138 12","4.5"},
                        {sasl,"SASL  CXC 138 11","2.1.10"},
                        {stdlib,"ERTS  CXC 138 10","1.17.5"},
                        {kernel,"ERTS  CXC 138 10","2.14.5"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,28540728},
          {connection_procs,68720},
          {queue_procs,93104},
          {plugins,0},
          {other_proc,9246280},
          {mnesia,94552},
          {mgmt_db,0},
          {msg_index,27792},
          {other_ets,797824},
          {binary,665848},
          {code,14419209},
          {atom,1354465},
          {other_system,1772934}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,1659928576},
 {disk_free_limit,1000000000},
 {disk_free,707502080},
 {file_descriptors,[{total_limit,924},
                    {total_used,11},
                    {sockets_limit,829},
                    {sockets_used,2}]},
 {processes,[{limit,1048576},{used,140}]},
 {run_queue,0},
 {uptime,38}]
...done.
[root@myhost rabbitmq]# st2ctl start
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2actionrunner: [  OK  ]
Starting st2api:                                           [  OK  ]
Starting st2stream:                                        [  OK  ]
Starting st2auth:                                          [  OK  ]
Starting st2garbagecollector:                              [  OK  ]
Starting st2notifier:                                      [  OK  ]
Starting st2resultstracker:                                [  OK  ]
Starting st2rulesengine:                                   [  OK  ]
Starting st2sensorcontainer:                               [  OK  ]
st2chatops: unrecognized service
Starting mistral-api:                                      [  OK  ]
Starting mistral-server:                                   [  OK  ]
##### st2 components status #####
st2actionrunner PID: 24232
st2actionrunner PID: 24249
st2actionrunner PID: 24270
st2actionrunner PID: 24291
st2actionrunner PID: 24311
st2actionrunner PID: 24333
st2actionrunner PID: 24354
st2actionrunner PID: 24375
st2actionrunner PID: 24398
st2actionrunner PID: 24419
st2api PID: 24454
st2api PID: 24556
st2stream PID: 24590
st2stream PID: 24613
st2auth PID: 24643
st2auth PID: 24658
st2garbagecollector PID: 24693
st2notifier PID: 24729
st2resultstracker PID: 24761
st2rulesengine PID: 24795
st2sensorcontainer PID: 24829
st2chatops is not running.
mistral-server PID: 24934
mistral-api PID: 24881
mistral-api PID: 24903
mistral-api PID: 24906
[root@myhost rabbitmq]# rabbitmqctl status
Status of node 'rabbit@myhost' ...
[{pid,24007},
 {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
                        {os_mon,"CPO  CXC 138 46","2.2.7"},
                        {xmerl,"XML parser","1.2.10"},
                        {mnesia,"MNESIA  CXC 138 12","4.5"},
                        {sasl,"SASL  CXC 138 11","2.1.10"},
                        {stdlib,"ERTS  CXC 138 10","1.17.5"},
                        {kernel,"ERTS  CXC 138 10","2.14.5"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,30863752},
          {connection_procs,1368920},
          {queue_procs,218136},
          {plugins,0},
          {other_proc,9353288},
          {mnesia,121720},
          {mgmt_db,0},
          {msg_index,30768},
          {other_ets,839616},
          {binary,1334824},
          {code,14419209},
          {atom,1354465},
          {other_system,1822806}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,1659928576},
 {disk_free_limit,1000000000},
 {disk_free,707022848},
 {file_descriptors,[{total_limit,924},
                    {total_used,40},
                    {sockets_limit,829},
                    {sockets_used,31}]},
 {processes,[{limit,1048576},{used,413}]},
 {run_queue,0},
 {uptime,89}]
...done.
[root@myhost rabbitmq]#

@LindsayHill
Copy link
Contributor

Cool, those numbers look a bit healthier. Hopefully now running st2ctl reload --register-all works?

@calvincannon
Copy link

calvincannon commented May 5, 2017

Ha! No.... Here's the verbose output. Would you like to see the output after I Ctrl-C'd out of it too? (There's a lot!)


[root@myhost rabbitmq]# st2ctl reload --register-all --verbose
Registering content...[flags = --config-file /etc/st2/st2.conf --register-all --verbose]
2017-05-05 14:33:03,461 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2017-05-05 14:33:03,463 DEBUG [-] Ensuring database indexes...
2017-05-05 14:33:03,470 DEBUG [-] Ensuring indexes for model "UserDB"...
2017-05-05 14:33:03,471 DEBUG [-] Removing extra indexes for model "UserDB"...
2017-05-05 14:33:03,472 DEBUG [-] Removed "0" extra indexes for model "UserDB"
2017-05-05 14:33:03,472 DEBUG [-] Ensuring indexes for model "TokenDB"...
2017-05-05 14:33:03,473 DEBUG [-] Removing extra indexes for model "TokenDB"...
2017-05-05 14:33:03,473 DEBUG [-] Removed "0" extra indexes for model "TokenDB"
2017-05-05 14:33:03,474 DEBUG [-] Ensuring indexes for model "ApiKeyDB"...
2017-05-05 14:33:03,475 DEBUG [-] Removing extra indexes for model "ApiKeyDB"...
2017-05-05 14:33:03,475 DEBUG [-] Removed "0" extra indexes for model "ApiKeyDB"
2017-05-05 14:33:03,475 DEBUG [-] Ensuring indexes for model "ActionDB"...
2017-05-05 14:33:03,478 DEBUG [-] Removing extra indexes for model "ActionDB"...
2017-05-05 14:33:03,479 DEBUG [-] Removed "0" extra indexes for model "ActionDB"
2017-05-05 14:33:03,479 DEBUG [-] Ensuring indexes for model "ActionExecutionDB"...
2017-05-05 14:33:03,487 DEBUG [-] Removing extra indexes for model "ActionExecutionDB"...
2017-05-05 14:33:03,489 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionDB"
2017-05-05 14:33:03,489 DEBUG [-] Ensuring indexes for model "ActionExecutionStateDB"...
2017-05-05 14:33:03,490 DEBUG [-] Removing extra indexes for model "ActionExecutionStateDB"...
2017-05-05 14:33:03,491 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionStateDB"
2017-05-05 14:33:03,491 DEBUG [-] Ensuring indexes for model "ActionAliasDB"...
2017-05-05 14:33:03,495 DEBUG [-] Removing extra indexes for model "ActionAliasDB"...
2017-05-05 14:33:03,496 DEBUG [-] Removed "0" extra indexes for model "ActionAliasDB"
2017-05-05 14:33:03,496 DEBUG [-] Ensuring indexes for model "LiveActionDB"...
2017-05-05 14:33:03,499 DEBUG [-] Removing extra indexes for model "LiveActionDB"...
2017-05-05 14:33:03,501 DEBUG [-] Removed "0" extra indexes for model "LiveActionDB"
2017-05-05 14:33:03,501 DEBUG [-] Ensuring indexes for model "RunnerTypeDB"...
2017-05-05 14:33:03,502 DEBUG [-] Removing extra indexes for model "RunnerTypeDB"...
2017-05-05 14:33:03,503 DEBUG [-] Removed "0" extra indexes for model "RunnerTypeDB"
2017-05-05 14:33:03,503 DEBUG [-] Ensuring indexes for model "ActionAliasDB"...
2017-05-05 14:33:03,504 DEBUG [-] Removing extra indexes for model "ActionAliasDB"...
2017-05-05 14:33:03,504 DEBUG [-] Removed "0" extra indexes for model "ActionAliasDB"
2017-05-05 14:33:03,504 DEBUG [-] Ensuring indexes for model "KeyValuePairDB"...
2017-05-05 14:33:03,506 DEBUG [-] Removing extra indexes for model "KeyValuePairDB"...
2017-05-05 14:33:03,507 DEBUG [-] Removed "0" extra indexes for model "KeyValuePairDB"
2017-05-05 14:33:03,507 DEBUG [-] Ensuring indexes for model "ActionExecutionDB"...
2017-05-05 14:33:03,510 DEBUG [-] Removing extra indexes for model "ActionExecutionDB"...
2017-05-05 14:33:03,511 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionDB"
2017-05-05 14:33:03,511 DEBUG [-] Ensuring indexes for model "ActionExecutionStateDB"...
2017-05-05 14:33:03,512 DEBUG [-] Removing extra indexes for model "ActionExecutionStateDB"...
2017-05-05 14:33:03,512 DEBUG [-] Removed "0" extra indexes for model "ActionExecutionStateDB"
2017-05-05 14:33:03,512 DEBUG [-] Ensuring indexes for model "LiveActionDB"...
2017-05-05 14:33:03,514 DEBUG [-] Removing extra indexes for model "LiveActionDB"...
2017-05-05 14:33:03,514 DEBUG [-] Removed "0" extra indexes for model "LiveActionDB"
2017-05-05 14:33:03,514 DEBUG [-] Ensuring indexes for model "PackDB"...
2017-05-05 14:33:03,516 DEBUG [-] Removing extra indexes for model "PackDB"...
2017-05-05 14:33:03,517 DEBUG [-] Removed "0" extra indexes for model "PackDB"
2017-05-05 14:33:03,517 DEBUG [-] Ensuring indexes for model "ConfigSchemaDB"...
2017-05-05 14:33:03,518 DEBUG [-] Removing extra indexes for model "ConfigSchemaDB"...
2017-05-05 14:33:03,518 DEBUG [-] Removed "0" extra indexes for model "ConfigSchemaDB"
2017-05-05 14:33:03,518 DEBUG [-] Ensuring indexes for model "ConfigDB"...
2017-05-05 14:33:03,519 DEBUG [-] Removing extra indexes for model "ConfigDB"...
2017-05-05 14:33:03,519 DEBUG [-] Removed "0" extra indexes for model "ConfigDB"
2017-05-05 14:33:03,519 DEBUG [-] Ensuring indexes for model "PolicyTypeDB"...
2017-05-05 14:33:03,521 DEBUG [-] Removing extra indexes for model "PolicyTypeDB"...
2017-05-05 14:33:03,521 DEBUG [-] Removed "0" extra indexes for model "PolicyTypeDB"
2017-05-05 14:33:03,521 DEBUG [-] Ensuring indexes for model "PolicyDB"...
2017-05-05 14:33:03,523 DEBUG [-] Removing extra indexes for model "PolicyDB"...
2017-05-05 14:33:03,523 DEBUG [-] Removed "0" extra indexes for model "PolicyDB"
2017-05-05 14:33:03,523 DEBUG [-] Ensuring indexes for model "RoleDB"...
2017-05-05 14:33:03,524 DEBUG [-] Removing extra indexes for model "RoleDB"...
2017-05-05 14:33:03,524 DEBUG [-] Removed "0" extra indexes for model "RoleDB"
2017-05-05 14:33:03,524 DEBUG [-] Ensuring indexes for model "UserRoleAssignmentDB"...
2017-05-05 14:33:03,525 DEBUG [-] Removing extra indexes for model "UserRoleAssignmentDB"...
2017-05-05 14:33:03,526 DEBUG [-] Removed "0" extra indexes for model "UserRoleAssignmentDB"
2017-05-05 14:33:03,526 DEBUG [-] Ensuring indexes for model "PermissionGrantDB"...
2017-05-05 14:33:03,526 DEBUG [-] Skipping index cleanup for blacklisted model "PermissionGrantDB"...
2017-05-05 14:33:03,526 DEBUG [-] Ensuring indexes for model "RuleDB"...
2017-05-05 14:33:03,529 DEBUG [-] Removing extra indexes for model "RuleDB"...
2017-05-05 14:33:03,530 DEBUG [-] Removed "0" extra indexes for model "RuleDB"
2017-05-05 14:33:03,530 DEBUG [-] Ensuring indexes for model "RuleEnforcementDB"...
2017-05-05 14:33:03,531 DEBUG [-] Removing extra indexes for model "RuleEnforcementDB"...
2017-05-05 14:33:03,532 DEBUG [-] Removed "0" extra indexes for model "RuleEnforcementDB"
2017-05-05 14:33:03,532 DEBUG [-] Ensuring indexes for model "RunnerTypeDB"...
2017-05-05 14:33:03,533 DEBUG [-] Removing extra indexes for model "RunnerTypeDB"...
2017-05-05 14:33:03,533 DEBUG [-] Removed "0" extra indexes for model "RunnerTypeDB"
2017-05-05 14:33:03,533 DEBUG [-] Ensuring indexes for model "SensorTypeDB"...
2017-05-05 14:33:03,535 DEBUG [-] Removing extra indexes for model "SensorTypeDB"...
2017-05-05 14:33:03,535 DEBUG [-] Removed "0" extra indexes for model "SensorTypeDB"
2017-05-05 14:33:03,535 DEBUG [-] Ensuring indexes for model "TraceDB"...
2017-05-05 14:33:03,538 DEBUG [-] Removing extra indexes for model "TraceDB"...
2017-05-05 14:33:03,539 DEBUG [-] Removed "0" extra indexes for model "TraceDB"
2017-05-05 14:33:03,539 DEBUG [-] Ensuring indexes for model "TriggerTypeDB"...
2017-05-05 14:33:03,541 DEBUG [-] Removing extra indexes for model "TriggerTypeDB"...
2017-05-05 14:33:03,542 DEBUG [-] Removed "0" extra indexes for model "TriggerTypeDB"
2017-05-05 14:33:03,542 DEBUG [-] Ensuring indexes for model "TriggerDB"...
2017-05-05 14:33:03,543 DEBUG [-] Removing extra indexes for model "TriggerDB"...
2017-05-05 14:33:03,543 DEBUG [-] Removed "0" extra indexes for model "TriggerDB"
2017-05-05 14:33:03,543 DEBUG [-] Ensuring indexes for model "TriggerInstanceDB"...
2017-05-05 14:33:03,545 DEBUG [-] Removing extra indexes for model "TriggerInstanceDB"...
2017-05-05 14:33:03,546 DEBUG [-] Removed "0" extra indexes for model "TriggerInstanceDB"
2017-05-05 14:33:03,546 DEBUG [-] Registering exchanges...
2017-05-05 14:33:03,558 DEBUG [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL.  See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2013 GoPivotal, Inc.', u'capabilities': {u'exchange_exchange_bindings': True, u'consumer_cancel_notify': True, u'publisher_confirms': True, u'basic.nack': True}, u'platform': u'Erlang/OTP', u'version': u'3.1.5'}, mechanisms: [u'PLAIN', u'AMQPLAIN'], locales: [u'en_US']
2017-05-05 14:33:03,558 DEBUG [-] Open OK!
2017-05-05 14:33:03,558 DEBUG [-] using channel_id: 1
2017-05-05 14:33:03,559 DEBUG [-] Channel open
2017-05-05 14:33:03,560 DEBUG [-] registered exchange st2.actionexecutionstate.
2017-05-05 14:33:03,560 DEBUG [-] registered exchange st2.announcement.
2017-05-05 14:33:03,560 DEBUG [-] registered exchange st2.execution.
2017-05-05 14:33:03,561 DEBUG [-] registered exchange st2.liveaction.
2017-05-05 14:33:03,561 DEBUG [-] registered exchange st2.liveaction.status.
2017-05-05 14:33:03,561 DEBUG [-] registered exchange st2.trigger.
2017-05-05 14:33:03,562 DEBUG [-] registered exchange st2.trigger_instances_dispatch.
2017-05-05 14:33:03,562 DEBUG [-] registered exchange st2.sensor.
2017-05-05 14:33:03,562 DEBUG [-] Closed channel #1
2017-05-05 14:33:03,563 INFO [-] =========================================================
2017-05-05 14:33:03,563 INFO [-] ############## Registering triggers #####################
2017-05-05 14:33:03,563 INFO [-] =========================================================
2017-05-05 14:33:03,563 DEBUG [-] Registering pack: default
2017-05-05 14:33:03,575 DEBUG [-] Pack default registered.
2017-05-05 14:33:03,576 DEBUG [-] Registering pack: core
2017-05-05 14:33:03,587 DEBUG [-] Pack core registered.
2017-05-05 14:33:03,587 DEBUG [-] Registering pack: linux
2017-05-05 14:33:03,600 DEBUG [-] Pack linux registered.
2017-05-05 14:33:03,600 DEBUG [-] Registering pack: chatops
2017-05-05 14:33:03,611 DEBUG [-] Pack chatops registered.
2017-05-05 14:33:03,611 DEBUG [-] Registering pack: packs
2017-05-05 14:33:03,623 DEBUG [-] Pack packs registered.
2017-05-05 14:33:03,624 DEBUG [-] Pack default does not contain triggers.
2017-05-05 14:33:03,624 DEBUG [-] Pack core does not contain triggers.
2017-05-05 14:33:03,624 DEBUG [-] Pack linux does not contain triggers.
2017-05-05 14:33:03,624 DEBUG [-] Pack chatops does not contain triggers.
2017-05-05 14:33:03,624 DEBUG [-] Pack packs does not contain triggers.
2017-05-05 14:33:03,624 INFO [-] Registered 0 triggers.
2017-05-05 14:33:03,624 INFO [-] =========================================================
2017-05-05 14:33:03,624 INFO [-] ############## Registering sensors ######################
2017-05-05 14:33:03,624 INFO [-] =========================================================
2017-05-05 14:33:03,625 DEBUG [-] Registering sensors from pack default:, dir: /opt/stackstorm/packs/default/sensors
2017-05-05 14:33:03,625 DEBUG [-] Registering sensors from pack core:, dir: /opt/stackstorm/packs/core/sensors
2017-05-05 14:33:03,625 DEBUG [-] Registering sensors from pack linux:, dir: /opt/stackstorm/packs/linux/sensors
2017-05-05 14:33:03,625 DEBUG [-] Loading sensor from /opt/stackstorm/packs/linux/sensors/file_watch_sensor.yaml.
2017-05-05 14:33:03,635 AUDIT [-] TriggerType updated. TriggerType.id=590bdc204bc0945dd7ab8f4d
2017-05-05 14:33:03,650 DEBUG [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL.  See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2013 GoPivotal, Inc.', u'capabilities': {u'exchange_exchange_bindings': True, u'consumer_cancel_notify': True, u'publisher_confirms': True, u'basic.nack': True}, u'platform': u'Erlang/OTP', u'version': u'3.1.5'}, mechanisms: [u'PLAIN', u'AMQPLAIN'], locales: [u'en_US']
2017-05-05 14:33:03,651 DEBUG [-] Open OK!
2017-05-05 14:33:03,651 DEBUG [-] using channel_id: 1
2017-05-05 14:33:03,652 DEBUG [-] Channel open

@calvincannon
Copy link

sigh Unfortunately, slack.com is blocked by our corporate proxy. I'll have to see if I can access it over my smartphone.

@LindsayHill
Copy link
Contributor

Sucks about Slack. Will be a pain for you typing via smartphone.

I'm guessing that rabbitmq is showing a high number of FDs and sockets in use again. May be worth bumping those up super-high, just to see if it will actually start. There's something messed up in there, and I'm not sure what it is, since I know that the combo of RHEL6 + rabbitMQ 3.1.5 + ST2 can & does work.

@calvincannon
Copy link

Not sure what to think there. This was a fresh RHEL 6 VM, so it shouldn't deviate too much from a fresh build of a DVD. I'm a RabbitMQ newbie. How do I go about doing that?

@LindsayHill
Copy link
Contributor

replied to you on Slack

@calvincannon
Copy link

calvincannon commented May 6, 2017

Okay, I'm really at a loss now. I've built internal yum repos, so here are the commands I've run to perform a StackStorm installation. I've performed this procedure multiple times now and it consistently hangs at exactly the same spot. The commands come from this procedure:
https://docs.stackstorm.com/install/rhel6.html

I have also attempted to bump up the file descriptors following this procedure:
https://jsosic.wordpress.com/2014/09/10/increase-rabbitmq-file-descriptor-limit-without-restart/


cd /etc/yum.repos.d
ll
yum install libffi-devel
yum -y install mongodb-org
yum -y install rabbitmq-server
service mongod start
service rabbitmq-server start
chkconfig mongod on
chkconfig rabbitmq-server on
yum -y install postgresql94-server postgresql94-contrib postgresql94-devel
service postgresql-9.4 initdb
sed -i "s/\(host.*all.*all.*127.0.0.1\/32.*\)ident/\1md5/" /var/lib/pgsql/9.4/data/pg_hba.conf
sed -i "s/\(host.*all.*all.*::1\/128.*\)ident/\1md5/" /var/lib/pgsql/9.4/data/pg_hba.conf
service postgresql-9.4 start
chkconfig postgresql-9.4 on
yum install -y st2 st2mistral

su - postgres 
	# Following commands are executed as postgres user	
	psql
	CREATE ROLE mistral WITH CREATEDB LOGIN ENCRYPTED PASSWORD 'StackStorm';
	CREATE DATABASE mistral OWNER mistral;
	\q
	exit

Following commands are once again run as root user.

/opt/stackstorm/mistral/bin/mistral-db-manage --config-file /etc/mistral/mistral.conf upgrade head
/opt/stackstorm/mistral/bin/mistral-db-manage --config-file /etc/mistral/mistral.conf populate
useradd stanley
mkdir -p /home/stanley/.ssh
chmod 0700 /home/stanley/.ssh
ssh-keygen -f /home/stanley/.ssh/stanley_rsa -P ""
sh -c 'cat /home/stanley/.ssh/stanley_rsa.pub >> /home/stanley/.ssh/authorized_keys'
chown -R stanley:stanley /home/stanley/.ssh
sh -c 'echo "stanley    ALL=(ALL)       NOPASSWD: SETENV: ALL" >> /etc/sudoers.d/st2'
chmod 0440 /etc/sudoers.d/st2
sed -i -r "s/^Defaults\s+\+?requiretty/# Defaults +requiretty/g" /etc/sudoers
st2ctl start
st2ctl reload --register-all --verbose  # <<<<<<  This is where it hangs, as noted above.

@LindsayHill
Copy link
Contributor

What happened when you increased file descriptor and socket limits? Did rabbitmq hit those new limits? Check rabbitmqctl status

Check local hostname resolution.

We know that rabbitmq generally works with RHEL6, and we know that ST2 works on a 'vanilla' RHEL system.

Presumably your organization has some customized templates you deploy RHEL from. What changes do you make? Changes to limits? Changes to SELinux?

Have you checked things like your syslogs, and audit logs (e.g. check for avc denied messages)

@calvincannon
Copy link

calvincannon commented May 6, 2017

Hey, thanks for getting back to me.

Hostname resolution is working fine and SELinux is disabled. I also built this VM with system hardening disabled, so it should be very close to a vanilla RHEL 6 VM. I'm not aware of any changes to default limits in this template. I've provided output below:


[root@myhost limits.d]# rabbitmqctl status
Status of node 'rabbit@myhost' ...
[{pid,1982},
 {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
                        {os_mon,"CPO  CXC 138 46","2.2.7"},
                        {xmerl,"XML parser","1.2.10"},
                        {mnesia,"MNESIA  CXC 138 12","4.5"},
                        {sasl,"SASL  CXC 138 11","2.1.10"},
                        {stdlib,"ERTS  CXC 138 10","1.17.5"},
                        {kernel,"ERTS  CXC 138 10","2.14.5"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,31876216},
          {connection_procs,1283016},
          {queue_procs,287536},
          {plugins,0},
          {other_proc,9231032},
          {mnesia,121720},
          {mgmt_db,0},
          {msg_index,44040},
          {other_ets,842776},
          {binary,673496},
          {code,14419209},
          {atom,1355273},
          {other_system,3618118}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,1659928576},
 {disk_free_limit,1000000000},
 {disk_free,715145216},
 {file_descriptors,[{total_limit,3996},
                    {total_used,41},
                    {sockets_limit,3594},
                    {sockets_used,32}]},
 {processes,[{limit,1048576},{used,422}]},
 {run_queue,0},
 {uptime,2021}]
...done.


[root@myhost limits.d]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15215
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15215
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[root@myhost limits.d]# 

@LindsayHill
Copy link
Contributor

What do you see in your rabbitmq logs now? Same errors as before? Different?

@calvincannon
Copy link

calvincannon commented May 6, 2017

Here's a sample of the rabbit@myhost.log


=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.2485.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.2549.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.2077.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.2269.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.2355.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.3080.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.3208.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.792.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 5-May-2017::18:57:20 ===
AMQP connection <0.1972.0> (blocked), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=INFO REPORT==== 5-May-2017::18:57:20 ===
Halting Erlang VM

=INFO REPORT==== 5-May-2017::18:57:24 ===
Starting RabbitMQ 3.1.5 on Erlang R14B04
Copyright (C) 2007-2013 GoPivotal, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 5-May-2017::18:57:24 ===
node           : rabbit@myhost
home dir       : /var/lib/rabbitmq
config file(s) : (none)
cookie hash    : lZcgTSucv7wHQyoNeJEzEA==
log            : /var/log/rabbitmq/rabbit@myhost.log
sasl log       : /var/log/rabbitmq/rabbit@myhost-sasl.log
database dir   : /var/lib/rabbitmq/mnesia/rabbit@myhost

=INFO REPORT==== 5-May-2017::18:57:24 ===
Limiting to approx 3996 file handles (3594 sockets)

=INFO REPORT==== 5-May-2017::18:57:24 ===
Memory limit set to 1583MB of 3957MB total.

=INFO REPORT==== 5-May-2017::18:57:24 ===
Disk free limit set to 1000MB

=INFO REPORT==== 5-May-2017::18:57:24 ===
Disk free space insufficient. Free bytes:715403264 Limit:1000000000

=WARNING REPORT==== 5-May-2017::18:57:24 ===
disk resource limit alarm set on node 'rabbit@myhost'.

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************

=INFO REPORT==== 5-May-2017::18:57:24 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 5-May-2017::18:57:24 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 5-May-2017::18:57:25 ===
started TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 5-May-2017::18:57:25 ===
Server startup complete; 0 plugins started.

=INFO REPORT==== 5-May-2017::18:57:30 ===
accepting AMQP connection <0.230.0> (127.0.0.1:51898 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::18:58:44 ===
accepting AMQP connection <0.250.0> (127.0.0.1:51908 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::18:58:44 ===
closing AMQP connection <0.250.0> (127.0.0.1:51908 -> 127.0.0.1:5672)

=INFO REPORT==== 5-May-2017::18:58:44 ===
accepting AMQP connection <0.260.0> (127.0.0.1:51910 -> 127.0.0.1:5672)

@LindsayHill
Copy link
Contributor

Logs seem to say you don't have enough free disk space

@calvincannon
Copy link

Hmmm.... Nothing is full. Which filesystem should I grow, /var?

@calvincannon
Copy link

BTW, thank you very much for your help today. I need to step away for dinner. I'll pick this up again a bit later. Thanks!

@LindsayHill
Copy link
Contributor

And looking back at that earlier df -h output, you have a...complicated partitioning layout. Seems less than optimal. Looks like this is not a vanilla system, but something re-purposed? Double check that you don't have any other services running and claiming ports. Probably not a problem now, but could cause issues later

@LindsayHill
Copy link
Contributor

Yes, it's using /var/lib for rabbitmq.

You can see from the logs that it wants to ensure a certain minimum amount of free space. You can probably change that limit, but it would not be a good idea. You're going to use up that small free space in /var quickly

MongoDB does something similar.

@calvincannon
Copy link

calvincannon commented May 6, 2017

SUCCESS! Thank you. Much appreciated. Have a great weekend!


[root@myhost rabbitmq]# st2ctl reload --register-all
Registering content...[flags = --config-file /etc/st2/st2.conf --register-all]
2017-05-05 23:53:15,057 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2017-05-05 23:53:15,147 INFO [-] =========================================================
2017-05-05 23:53:15,147 INFO [-] ############## Registering triggers #####################
2017-05-05 23:53:15,147 INFO [-] =========================================================
2017-05-05 23:53:15,203 INFO [-] Registered 0 triggers.
2017-05-05 23:53:15,203 INFO [-] =========================================================
2017-05-05 23:53:15,203 INFO [-] ############## Registering sensors ######################
2017-05-05 23:53:15,203 INFO [-] =========================================================
2017-05-05 23:53:15,238 INFO [-] Registered 1 sensors.
2017-05-05 23:53:15,238 INFO [-] =========================================================
2017-05-05 23:53:15,239 INFO [-] ############## Registering runners ######################
2017-05-05 23:53:15,239 INFO [-] =========================================================
2017-05-05 23:53:15,927 INFO [-] Registered 13 runners.
2017-05-05 23:53:15,927 INFO [-] =========================================================
2017-05-05 23:53:15,927 INFO [-] ############## Registering actions ######################
2017-05-05 23:53:15,927 INFO [-] =========================================================
2017-05-05 23:53:17,553 INFO [-] Registered 46 actions.
2017-05-05 23:53:17,553 INFO [-] =========================================================
2017-05-05 23:53:17,554 INFO [-] ############## Registering rules ########################
2017-05-05 23:53:17,554 INFO [-] =========================================================
2017-05-05 23:53:17,631 INFO [-] Registered 1 rules.
2017-05-05 23:53:17,632 INFO [-] =========================================================
2017-05-05 23:53:17,632 INFO [-] ############## Registering aliases ######################
2017-05-05 23:53:17,632 INFO [-] =========================================================
2017-05-05 23:53:17,764 INFO [-] Registered 4 aliases.
2017-05-05 23:53:17,764 INFO [-] =========================================================
2017-05-05 23:53:17,764 INFO [-] ############## Registering policy types #################
2017-05-05 23:53:17,765 INFO [-] =========================================================
2017-05-05 23:53:17,842 INFO [-] Registered 3 policy types.
2017-05-05 23:53:17,843 INFO [-] =========================================================
2017-05-05 23:53:17,843 INFO [-] ############## Registering policies #####################
2017-05-05 23:53:17,843 INFO [-] =========================================================
2017-05-05 23:53:17,844 INFO [-] Registered 0 policies.
2017-05-05 23:53:17,844 INFO [-] =========================================================
2017-05-05 23:53:17,844 INFO [-] ############## Registering configs ######################
2017-05-05 23:53:17,845 INFO [-] =========================================================
2017-05-05 23:53:17,846 INFO [-] Registered 0 configs.
##### st2 components status #####
st2actionrunner PID: 2190
st2actionrunner PID: 2207
st2actionrunner PID: 2228
st2actionrunner PID: 2251
st2actionrunner PID: 2268
st2actionrunner PID: 2289
st2actionrunner PID: 2313
st2actionrunner PID: 2335
st2actionrunner PID: 2357
st2actionrunner PID: 2378
st2api is not running.
st2stream PID: 2557
st2stream PID: 2577
st2auth PID: 2606
st2auth PID: 2622
st2garbagecollector PID: 2657
st2notifier PID: 2693
st2resultstracker PID: 2725
st2rulesengine PID: 2759
st2sensorcontainer PID: 2791
st2chatops is not running.
mistral-server PID: 2904
mistral-api PID: 2843
mistral-api PID: 2866
mistral-api PID: 2869
[root@myhost rabbitmq]# 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants