Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ejabberd do not start after update from 23.10 to 24.02 #4199

Open
giostark opened this issue Apr 13, 2024 · 5 comments
Open

ejabberd do not start after update from 23.10 to 24.02 #4199

giostark opened this issue Apr 13, 2024 · 5 comments

Comments

@giostark
Copy link

giostark commented Apr 13, 2024

Hi all :-)
A problem similar to this : #3150
That unfortunately do not come resolved. (the memsup was different from my epam).
If I understood properly the database come updated itself. I'm using the included database.
After the mentioned update ejabberd do not start anymore with the following errors.
I don know what to do... really...
Please ask me all the necessary info (and how to gather them) for investigate this issue.
(I tried to entirely remove the firewall but the problem persist)
The server were working fine , I launched the new install (24.02) , rebooted , the issue.
I'm an Arch user...
Thanks at any good soul will find the time to support me ;-)

2024-04-13 13:47:50.742784+02:00 [info] <0.190.0>@ejabberd_config:load/1:82 Loading configuration from /opt/ejabberd/conf/ejabberd.yml
2024-04-13 13:47:51.785172+02:00 [info] <0.190.0>@ejabberd_config:load/1:89 Configuration loaded successfully
2024-04-13 13:52:49.244505+02:00 [notice] <0.70.0> SIGTERM received - shutting down


2024-04-13 14:01:26.057330+02:00 [error] <0.214.0> External eimp process (pid=3826) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.058274+02:00 [error] <0.216.0> External eimp process (pid=3827) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.058452+02:00 [error] <0.212.0> External eimp process (pid=3825) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.058905+02:00 [error] <0.210.0> External eimp process (pid=3824) has terminated unexpectedly, restarting in a few seconds
2024-04-13 14:01:26.064626+02:00 [error] <0.222.0>@gen_server:error_info/8:1391 ** Generic server epam terminating 
** Last message in was {#Port<0.10>,{exit_status,143}}
** When Server state == {state,#Port<0.10>}
** Reason for termination ==
** port_died

2024-04-13 14:01:26.065731+02:00 [error] <0.222.0>@proc_lib:crash_report/4:584 CRASH REPORT:
  crasher:
    initial call: epam:init/1
    pid: <0.222.0>
    registered_name: epam
    exception exit: port_died
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1226)
    ancestors: [epam_sup,<0.220.0>]
    message_queue_len: 0
    messages: []
    links: [<0.221.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 4185
    stack_size: 28
    reductions: 7829
  neighbours:

2024-04-13 14:01:26.080804+02:00 [error] <0.221.0>@supervisor:do_restart/3:744 SUPERVISOR REPORT:
    supervisor: {local,epam_sup}
    errorContext: child_terminated
    reason: port_died
    offender: [{pid,<0.222.0>},
               {id,epam},
               {mfargs,{epam,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,brutal_kill},
               {child_type,worker}]
@giostark
Copy link
Author

giostark commented Apr 13, 2024

This is the status of the service. Seems that it stay pending the connection.
I don't know how to investigate that "epam" that may cause this stuck.
And do not seems related with other "epam port died" I found on the net.

[arch-nas@arch-nas ~]$ sudo systemctl status ejabberd
● ejabberd.service - XMPP Server
     Loaded: loaded (/etc/systemd/system/ejabberd.service; enabled; preset: disabled)
     Active: activating (start) since Sat 2024-04-13 20:48:18 CEST; 4min 24s ago
   Main PID: 2353 (ejabberdctl)
      Tasks: 34 (limit: 9378)
     Memory: 739.5M (peak: 907.0M)
        CPU: 5min 2.304s
     CGroup: /system.slice/ejabberd.service
             ├─2353 /bin/sh /opt/ejabberd-24.02/bin/ejabberdctl foreground
             ├─2366 /opt/ejabberd-24.02/erts-14.2.2/bin/beam.smp -K true -P 250000 -- -root /opt/ejabberd-24.02 -bindir /opt/ejabberd-24.02/erts-14.2.2/bin -progname erl -- -h>
             ├─2378 /opt/ejabberd-24.02/erts-14.2.2/bin/epmd -daemon
             ├─2382 erl_child_setup 65536
             ├─2407 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2408 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2409 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2410 /opt/ejabberd-24.02/lib/eimp-1.0.22/priv/bin/eimp
             ├─2411 /opt/ejabberd-24.02/lib/epam-1.0.14/priv/bin/epam
             ├─2412 /opt/ejabberd-24.02/erts-14.2.2/bin/inet_gethost 4
             └─2413 /opt/ejabberd-24.02/erts-14.2.2/bin/inet_gethost 4

Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {child_type,supervisor}]
Apr 13 20:48:21 arch-nas ejabberdctl[2366]: 2024-04-13 20:48:21.216010+02:00 [debug] PROGRESS REPORT:
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:     supervisor: {local,kernel_safe_sup}
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:     started: [{pid,<0.404.0>},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {id,disk_log_server},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {mfargs,{disk_log_server,start_link,[]}},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {restart_type,permanent},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {significant,false},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {shutdown,2000},
Apr 13 20:48:21 arch-nas ejabberdctl[2366]:               {child_type,worker}]

@giostark
Copy link
Author

At "sudo journalctl -xeu ejabberd.service" this is the output:

[arch-nas@arch-nas ~]$ sudo journalctl -xeu ejabberd.service
[sudo] password for arch-nas: 
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,3000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.647473+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,mnesia_kernel_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.396.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,mnesia_subscr},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{mnesia_subscr,start,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,3000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.661363+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,mnesia_kernel_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.397.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,mnesia_locker},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{mnesia_locker,start,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,3000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.670528+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,mnesia_kernel_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.398.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,mnesia_recover},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{mnesia_recover,start,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,180000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.752914+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,kernel_safe_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.403.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,disk_log_sup},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{disk_log_sup,start_link,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,1000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,supervisor}]
Apr 13 20:58:34 arch-nas ejabberdctl[2870]: 2024-04-13 20:58:34.753550+02:00 [debug] PROGRESS REPORT:
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     supervisor: {local,kernel_safe_sup}
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:     started: [{pid,<0.404.0>},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {id,disk_log_server},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {mfargs,{disk_log_server,start_link,[]}},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {restart_type,permanent},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {significant,false},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {shutdown,2000},
Apr 13 20:58:34 arch-nas ejabberdctl[2870]:               {child_type,worker}]

@giostark
Copy link
Author

I tried to substitute the "/opt/ejabberd-24.02/lib/epam-1.0.14/priv/bin/epam" with the old one 1.0.12 from the 23.10 installation but of course it do not worked ...

@badlop
Copy link
Member

badlop commented Apr 15, 2024

As you use the internal (mnesia) database, I guess you made a backup of its files before updating the program.

What happens if you uninstall ejabberd 24.02, recover back the old mnesia directory, install ejabberd 23.10 and start it: does it start correctly?

@giostark
Copy link
Author

Hi Lop !
Thanks for the patience to read all this :-)
I try to explain as I do operate with my hardware. Maybe is useful...
I have an Asustor NAS in which I installed Arch (instead of using the stock software) I have two MX500 1TB (and other two disk but not relevant in this case).
The main disk is just one , the second is passive. (not auto cloning software or similar stuff) .
From time to time I update the system , reboot the machine to check everything is working and then update the ejabberd using the .run file.
I install it as root and it place all the files in opt.
If the update of jabber end fine and it work without errors of any kind I power off the machine , using a USB installation pen of Arch I clone with dd the entire first disk to the second disk. Like this I have a working backup but not subjected at those kind of errors.
And right now I'm running this backup with the 23.10 installed ! I'm using this backup made nearly one month ago.
On the first disk I tried to install and configure prosody and it after some hours of learning curve (i'm a noob) ended working fine but of course with its limitation for some defective modules.

I tried as you suggested and the errors became like this:


2024-04-17 22:46:15.105809+02:00 [error] <0.457.0>@proc_lib:crash_report/4:584 CRASH REPORT:
  crasher:
    initial call: ejabberd_commands:init/1
    pid: <0.457.0>
    registered_name: []
    exception exit: {aborted,{node_not_running,ejabberd@localhost}}
      in function  mnesia:abort/1 (mnesia.erl, line 360)
      in call from ejabberd_commands:init/1 (src/ejabberd_commands.erl, line 111)
      in call from gen_server:init_it/2 (gen_server.erl, line 962)
      in call from gen_server:init_it/6 (gen_server.erl, line 917)
    ancestors: [ejabberd_sup,<0.179.0>]
    message_queue_len: 0
    messages: []
    links: [<0.384.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 206
  neighbours:

2024-04-17 22:46:15.107700+02:00 [error] <0.384.0>@supervisor:start_children/2:398 SUPERVISOR REPORT:
    supervisor: {local,ejabberd_sup}
    errorContext: start_error
    reason: {aborted,{node_not_running,ejabberd@localhost}}
    offender: [{pid,undefined},
               {id,ejabberd_commands},
               {mfargs,{ejabberd_commands,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,60000},
               {child_type,worker}]

2024-04-17 22:46:15.120800+02:00 [debug] <0.385.0>@ejabberd_systemd:terminate/2:133 Terminating ejabberd_systemd (shutdown)
2024-04-17 22:46:15.121057+02:00 [debug] <0.385.0>@ejabberd_systemd:cancel_timer/1:177 Cancelling watchdog timer
2024-04-17 22:46:15.121569+02:00 [critical] <0.179.0>@ejabberd_app:start/2:68 Failed to start ejabberd application: {error,
                                       {shutdown,
                                        {failed_to_start_child,
                                         ejabberd_commands,
                                         {aborted,
                                          {node_not_running,
                                           ejabberd@localhost}}}}}

I also performed an entire OPT substitution but it still fail to start.
Boh , I don't know if something is gone corrupted during the last update and have damaged some libraries .
Everything appeared flawless and the system appear to work fine. (I switched to the first disk for all the tests and I'm writing from it) .
I tried also to reinstall all the apps and libraries mentioned in the dependencies of ejabberd (as if I would install it from source).
I tried to install it also from source and it fail again.
I fear to touch this disk that work. But , I can operate as follow: Use dd to backup again the old working disk.
Despite the regression I'll have two working disk and then operate again the update of everything.
But this time with the option to bump a copy of the database. (to a mysql extension for example).
I'll proceed like this and I will report the result :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants