Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoloading playlists failing randomly #2670

Open
matias-tecnosoul opened this issue Aug 23, 2023 · 37 comments
Open

Autoloading playlists failing randomly #2670

matias-tecnosoul opened this issue Aug 23, 2023 · 37 comments
Labels
api This is affecting the api is: bug

Comments

@matias-tecnosoul
Copy link

Describe the bug

We had random failures with “autoloading playlists” not being loaded, leaving the shows without tracks.

The shows are working fine with the lists being autoloaded correctly but they've failed a couple of time in the last week.
We tried recreating the blocks and lists but the issue happened again.
I've asked in the forum and one other person had the issue as well.
Thanks!!

To reproduce

We couldn't find how to reproduce the issue. The issue happened in 2 LT installs on different machines. One install is fresh and the other is an upgrade from older versions. On the other hand it didn´t happen in 2 other installs, both upgraded.

Expected behavior

Shows being autoloaded from the configured autoload playlists.

Relevant log output or error messages

The error ocurred in the 20:00hs show.

Liquidsoap logs

2023/08/21 19:56:45 [s3:3] Prepared "/var/lib/libretime/playout/scheduler/418.mp3" (RID 0).
2023/08/21 19:56:45 [lang:3] timeout --signal=KILL 45 libretime-playout-notify media '14899' &
2023/08/21 19:56:45 [lang:3] Using message format 0
2023/08/21 19:56:45 [lang:3] Using message format 0
2023-08-21 19:56:45,713 | INFO     | libretime_playout.notify.main:media:72 - Sending currently playing media id '14899'
2023/08/21 19:59:59 [cue_cut_9618:3] Cueing out...
2023/08/21 19:59:59 [dummy:3] Source failed (no more tracks) stopping output...
2023/08/21 19:59:59 [dummy(dot)2:3] Source failed (no more tracks) stopping output...
2023/08/21 19:59:59 [switch:blank+schedule:3] Switch to map_metadata:offline with forgetful transition.
2023/08/21 19:59:59 [lang:3] transition called...
2023/08/21 20:00:00 [server:3] New client: 127.0.0.1.
2023/08/21 20:00:00 [server:3] Client 127.0.0.1 disconnected.
2023/08/21 20:02:11 [server:3] New client: 127.0.0.1.
2023/08/21 20:02:11 [lang:3] web_stream.get_id
2023/08/21 20:02:11 [server:3] Client 127.0.0.1 disconnected.
2023/08/21 20:08:51 [server:3] New client: 127.0.0.1.

Playout logs

2023-08-21 18:55:27,794 | INFO     | libretime_playout.player.fetch:main:311 - Loop #1223
2023-08-21 18:55:27,797 | INFO     | libretime_playout.player.queue:main:57 - New schedule received
2023-08-21 18:55:27,797 | INFO     | libretime_playout.player.queue:main:39 - waiting 28.202465s until next scheduled item
2023-08-21 18:55:56,010 | INFO     | libretime_playout.player.queue:main:39 - waiting 296.989672s until next scheduled item
2023-08-21 19:00:53,003 | INFO     | libretime_playout.player.queue:main:39 - waiting 342.996558s until next scheduled item
2023-08-21 19:02:07,794 | INFO     | libretime_playout.player.fetch:main:332 - Queue timeout. Fetching schedule manually
2023-08-21 19:02:08,434 | INFO     | libretime_playout.player.queue:main:57 - New schedule received
2023-08-21 19:02:08,434 | INFO     | libretime_playout.player.queue:main:39 - waiting 267.565633s until next scheduled item
2023-08-21 19:02:08,465 | INFO     | libretime_playout.player.fetch:cache_cleanup:255 - File '/var/lib/libretime/playout/scheduler/430.mp3' removed
2023-08-21 19:02:08,481 | INFO     | libretime_playout.player.fetch:cache_cleanup:255 - File '/var/lib/libretime/playout/scheduler/406.mp3' removed
2023-08-21 19:02:08,481 | INFO     | libretime_playout.player.fetch:main:311 - Loop #1224
2023-08-21 19:06:36,004 | INFO     | libretime_playout.player.queue:main:39 - waiting 1.995088s until next scheduled item
2023-08-21 19:06:38,012 | INFO     | libretime_playout.player.queue:main:39 - waiting 366.987057s until next scheduled item
2023-08-21 19:08:48,482 | INFO     | libretime_playout.player.fetch:main:332 - Queue timeout. Fetching schedule manually
...
2023-08-21 20:08:51,729 | INFO     | libretime_playout.player.fetch:main:311 - Loop #1234
2023-08-21 20:15:31,729 | INFO     | libretime_playout.player.fetch:main:332 - Queue timeout. Fetching schedule manually
2023-08-21 20:15:31,762 | INFO     | libretime_playout.player.fetch:main:311 - Loop #1235
2023-08-21 20:15:31,764 | INFO     | libretime_playout.player.queue:main:57 - New schedule received
2023-08-21 20:15:31,765 | INFO     | libretime_playout.player.queue:main:36 - waiting indefinitely for schedule
2023-08-21 20:22:11,762 | INFO     | libretime_playout.player.fetch:main:332 - Queue timeout. Fetching schedule manually

Let me know if you need more logs.

LibreTime version

3.1.0

Installation method and OS / Environment

Operating system: Debian Bullseye
Method: Installer script

Installation details

No response

Client Environment

Operating system:
Browser:

Screenshots

No response

@esbobkov
Copy link

esbobkov commented Sep 6, 2023

I'm that person with same problem.

I assumed the problem was after downgrade RAM from 2Gb to 1Gb on server.
I returned 2RB RAM and the problem remains

I plan to enable debug level logs to see what's happens

@esbobkov
Copy link

Can't find any special in logs (

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Sep 18, 2023

I'm trying out a couple of dockerized installs and the issue happened again today.
I'm enclosing the logs of one of them. The logs are 10 min before and after the show change (in this case at 12:00) and in the same time range the 2 previous hours. Libretime is configured with the default cache_ahead_hours=1.

libretime-logs-autoload-error-2023-09-18T09_50_00.log
libretime-logs-autoload-error-2023-09-18T10_50_00.log
libretime-logs-autoload-error-2023-09-18T11_50_00.log

@matias-tecnosoul
Copy link
Author

Regarding the cache_ahead_hours parameter ("How many hours ahead Playout should cache scheduled media files"), does it mean that the show will get auto filled with the tracks one hour before? I mean, will I be able to see the tracks loaded in the show? Or it justs "caches" them and when the show begins they will show up?

@Robbt
Copy link
Member

Robbt commented Sep 21, 2023

So they are added to the show an hour ahead. I'm not sure where that specific parameter fits in (sorry it's been a while since I touched the codebase). But essentially the autoloading playlists is a hack where an hour before the show is set to air it schedules the tracks. Unless something has changed the autoloading playlists happens via PHP and so while the specific tracks being added maybe in the pypo log if there is an error it would probably be in the zendphp.log file.

@eleontie-csm
Copy link

eleontie-csm commented Oct 17, 2023

Is there a way to trigger through an api call , manual call the autoloading of the playlist ? I am having the same issue of the autoload failing occasionally and would like to try to debug the issue , but I don't understand the mechanism through with it is scheduled and run, hoping I can take it step by step and backtrack how it is supposed to be working

@Robbt
Copy link
Member

Robbt commented Oct 17, 2023

There is no manual way to trigger the autoloading playlist that I'm aware of. I also don't think it is added to the API although this would be a good idea for a feature request.

@esbobkov
Copy link

There is no manual way to trigger the autoloading playlist that I'm aware of. I also don't think it is added to the API although this would be a good idea for a feature request.

Maybe we could add console command for regenerate autoloading playlist?

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Nov 13, 2023

We've noticed a weird behaviour when a show crosses the midnight border of sundays to mondays, or when the shows starts on mondays at 00:00hs: We can see the red "!" on the show even though it contains a "Repeat Playlist Until Show is Full " autoloading playlist. We don´t see that issue for the rest of the week although it's the same show repeating everyday.
image

If I edit the show to start at 00:01 the indicator dissapears:
image
I don´t know if it can be related but it seems that on mondays we have the problem more frecuently (I can´t give you stats about this sadly).

@Robbt
Copy link
Member

Robbt commented Nov 13, 2023

The red! is due to a bug in the calendar javascript. I don't think it's related to the failing autoplaylist. Again the zendphp.log generally has information related to when autoplaylists are triggered and is probably a good place to find if there are any hints as to why a particular playlist fails to run. See if you can see the logs from successful runs and then see if there is anything there for the missing ones.

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Nov 20, 2023

Again the zendphp.log generally has information related to when autoplaylists are triggered

Hi @Robbt , I can´t find zendphp.log!
I've checked in a 3.2.0 "script installed" instance and a 3.1.0 docker instance.
What am I missing? thanks!

@paddatrapper
Copy link
Contributor

/var/log/libretime/zendphp.log

@esbobkov
Copy link

esbobkov commented Nov 21, 2023

The Legacy log file moved from /var/log/airtime/zendphp.log to /var/log/libretime/legacy.log

3.0.0-alpha.11.md

I don't see anything related to this issue in legacy.log

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Dec 6, 2023

I don't see anything related to this issue in legacy.log

me neither...

@matias-tecnosoul
Copy link
Author

I'm that person with same problem.

I assumed the problem was after downgrade RAM from 2Gb to 1Gb on server. I returned 2RB RAM and the problem remains

I'm trying to monitor the server (using Netdata) to see if there's a high load or similar at the time of the failures. Until now I couln't see anything.
But, could this be related to this issue already solved in 3.2.0?
Or was it related to docker instances only? (the autoload failures issue were happening in both docker and in "script install" inside LXCs)
I'm testing a couple of 3.2.0 instances now.

@matias-tecnosoul
Copy link
Author

It happened in 3.2.0 as well...

@matias-tecnosoul
Copy link
Author

I had to reinstall a v3.0.2 for the time being. But I will keep testing in another instances from git directly.
Let me know if I can help debugging somehow.
thanks a lot for your efforts!!

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Jan 2, 2024

Just out of curiosity, is anyone having this issue using Ubuntu? I'm using debian 11 in all our instances.
(edit: I've just realized that we're also having the issue with dockerized instances, so I guess the distro doesn´t have much influence)

@matias-tecnosoul
Copy link
Author

I'm that person with same problem.

I assumed the problem was after downgrade RAM from 2Gb to 1Gb on server. I returned 2RB RAM and the problem remains

I plan to enable debug level logs to see what's happens

I'm still wondering if it could be related to system resources... I'm testing in dedicated servers and vps... but couldnt find anything yet... so I guess this is not the issue either...

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Jan 10, 2024

As I mentioned in Discourse I can provide access to one of our containers having the issue. I have a couple that we kept for debbuging.
cheers!!

@matias-tecnosoul
Copy link
Author

matias-tecnosoul commented Feb 1, 2024

Hi, I'm testing with git version... I happened again. And this time the autoload stopped working for all shows after that failure. I mean, it usually fails on 1 show and after that it resumes autoloading the next show. But sometimes it fails on 1 show and all the following. Like here, it failed on monday and from then the playout kept empty until today.

image

I add the relevant logs I could find on the day of the first failure. The failing autoload happened at 22:00hs:

playout
2024-01-29 21:57:53,427 | INFO     | libretime_playout.player.fetch:main:332 - Queue timeout. Fetching schedule manually
2024-01-29 21:57:53,761 | INFO     | libretime_playout.player.queue:main:57 - New schedule received
2024-01-29 21:57:53,761 | INFO     | libretime_playout.player.queue:main:39 - waiting 94.238394s until next scheduled item
2024-01-29 21:57:53,794 | INFO     | libretime_playout.player.fetch:cache_cleanup:255 - File '/var/lib/libretime/playout/scheduler/1963.mp3' removed
2024-01-29 21:57:53,794 | INFO     | libretime_playout.player.fetch:main:311 - Loop #2839
2024-01-29 21:59:28,031 | INFO     | libretime_playout.player.queue:main:39 - waiting 31.968477s until next scheduled item
2024-01-29 22:00:00,015 | INFO     | libretime_playout.player.queue:main:36 - waiting indefinitely for schedule
2024-01-29 22:04:33,795 | INFO     | libretime_playout.player.fetch:main:332 - Queue timeout. Fetching schedule manually
2024-01-29 22:04:33,959 | INFO     | libretime_playout.player.queue:main:57 - New schedule received
2024-01-29 22:04:33,959 | INFO     | libretime_playout.player.queue:main:36 - waiting indefinitely for schedule


liquidsoap.log
024/01/29 21:53:44 [lang:3] Using message format 0
2024-01-29 21:53:45,742 | INFO     | libretime_playout.notify.main:media:72 - Sending currently playing media id '81180'
2024/01/29 21:57:53 [server:3] New client: 127.0.0.1.
2024/01/29 21:57:53 [lang:3] web_stream.get_id
2024/01/29 21:57:53 [server:3] Client 127.0.0.1 disconnected.
2024/01/29 21:59:28 [server:3] New client: 127.0.0.1.
2024/01/29 21:59:28 [server:3] Client 127.0.0.1 disconnected.
2024/01/29 21:59:28 [decoder:3] Method "FFMPEG" accepted "/var/lib/libretime/playout/scheduler/2045.mp3".
2024/01/29 21:59:28 [ffmpeg:3] [mp3 @ 0x7fd11812e740] Estimating duration from bitrate, this may be inaccurate
2024/01/29 21:59:28 [cue_cut_9615:3] End of track before cue-out point.
2024/01/29 21:59:28 [ffmpeg:3] [mp3 @ 0x7fd0fdbf65c0] Estimating duration from bitrate, this may be inaccurate
2024/01/29 21:59:28 [ffmpeg:3] [mp3 @ 0x7fd0fdbf65c0] Estimating duration from bitrate, this may be inaccurate
2024/01/29 21:59:28 [s3:3] Prepared "/var/lib/libretime/playout/scheduler/2045.mp3" (RID 2).
2024/01/29 21:59:28 [lang:3] timeout --signal=KILL 45 libretime-playout-notify media '81181' &
2024/01/29 21:59:28 [lang:3] Using message format 0
2024/01/29 21:59:28 [lang:3] Using message format 0
2024-01-29 21:59:29,547 | INFO     | libretime_playout.notify.main:media:72 - Sending currently playing media id '81181'
2024/01/29 21:59:59 [cue_cut_9615:3] Cueing out...
2024/01/29 21:59:59 [switch:blank+schedule:3] Switch to map_metadata:offline with forgetful transition.
2024/01/29 21:59:59 [lang:3] transition called...
2024/01/29 22:00:00 [server:3] New client: 127.0.0.1.
2024/01/29 22:00:00 [server:3] Client 127.0.0.1 disconnected.
2024/01/29 22:00:00 [dummy(dot)2:3] Source failed (no more tracks) stopping output...
2024/01/29 22:00:00 [dummy:3] Source failed (no more tracks) stopping output...

legacy.log and worker.log stopped logging a few days earlier. I don´t know if this is normal in v4 ?
thanks!

@paddatrapper
Copy link
Contributor

legacy.log and worker.log stopped logging a few days earlier. I don´t know if this is normal in v4 ?

They shouldn't...

As far as I remember, the population of the autoplaylist is either triggered by legacy and schedules a job in worker or is done directly by legacy. So the logs from those should help shed some light

@matias-tecnosoul
Copy link
Author

Guys, so this is not happening to most of you right?? is it because you don´t rely so much on autoloading playlists? or because there's something wrong with my instances??
I find it so weird that this is not happening to more people. I have tried like 20 different installs including docker and none worked correctly, so I don´t know what else to think and try... :/

@matias-tecnosoul
Copy link
Author

The only "different" setup I'm using is external icecast servers. I'm not using the default icecast installed with the standard install.
I mention it just in case... cheers

@esbobkov
Copy link

My station uses autoloading playlists only and now it's impossibly to use LibreTime with this issue
@jooola Hello, Sorry for tag. Could You Help us?

@jooola
Copy link
Contributor

jooola commented Feb 29, 2024

Could you folks share us the content of your cc_pref table please? Feel free to remove any sensitive information.

sudo -u libretime libretime-api dbshell -- -t -c "select * from cc_pref where keystr not in ('logoImage', 'library_datatable', 'library_screen', 'calendar_time_scale', 'live_stream_master_password', 'user_timezone', 'user_locale');"

@jooola
Copy link
Contributor

jooola commented Feb 29, 2024

We need the logs from legacy, when the failure occurs.

I had a similar behaviour some months ago, not sure if it is related. Here is maybe an idea why it happened on my side: the cc_pref table holds a timestamp of when the autoplaylist last run. The cc_pref table does not have a constraint on keystr and subjid=null so you might end up with multiple timestamps for when the last autoplaylist generation ran.

If you folks can share the content of the cc_pref table, can could maybe already rule out the behaviour I described.

@caveman99
Copy link
Contributor

caveman99 commented Mar 14, 2024

i had the same happen now on 3 of 4 instances i am running (native, ubuntu), and i noticed something new. apart from the lack of legacy.log updating (looks like something is wrong with log rotation and logs go to the nether realms after that) there were no shows at all in the dashboard view to update. So it's not the autoloading thats failing but the actual show instance creation from the repeated schedule calendar. As soon as i accessed the WebUi and refreshed, the shows bgan to reappear in the right pane planning view and a few minutes later the next hour was populated again. (autoloading works if a show starts in 'under an hour from now', it usually is triggered at the earliest point but thats not a requirement)

EDIT: Other stations with this problem had logging still running, BUT nothing conspicuous in the logs at all, in a window +/-2h of it stopping working.

@caveman99
Copy link
Contributor

To continue the investigation, i think i know whats going on. Repeated Show Instances are not scheduled beyond the initial schedule period casuse somehow createAndFillShowInstancesPastPopulatedUntilDate() is not firing. Quote from source:

"This can occur when a user is viewing the next day/week/month in the calendar or when pypo is requesting the schedule after the shows_populated_until date and time."

Could it be this periodic rescheduling of show instances (and then subsequently filling them with autoload playlists) is not triggered any more but only by visiting the calendar view? Cause that's what made it work for me again.

@caveman99
Copy link
Contributor

caveman99 commented Mar 30, 2024

Okay, here's the actual bug. Pypo (libretime-playout) is using the v2 api to retrieve a schedule, never calling the legacy functions (v1 api) but accessing the database directly. This means the refills are never triggered when nobody logs into the UI and browses the schedule calendar. Use regular calls (once a day) to http://<your station ip>/api/schedule as a workaround

@caveman99 caveman99 added the api This is affecting the api label Mar 30, 2024
@paddatrapper
Copy link
Contributor

And this is why moving away from the legacy php code is so difficult...

@caveman99
Copy link
Contributor

@paddatrapper @jooola would a quick fix adding the missing call to the top of the legacy autoloader be accepted? or would you rather add this to the v2 api?

@paddatrapper
Copy link
Contributor

@caveman99 yes, I think at is an appropriate fix for now. Long term we should fix the v2 api, but considering the lack of time I have available at the moment, I can't say when that would be

@jooola
Copy link
Contributor

jooola commented Apr 13, 2024

Awesome that you went down this rabbit hole @caveman99 !

The magic happens around the TaskManager and the PageLayoutInitPlugin, that is triggered on every API call here:

$taskManager = TaskManager::getInstance();
// Piggyback the TaskManager onto API calls. This provides guaranteed consistency
// (there is at least one API call made from pypo to Airtime every 7 minutes) and
// greatly reduces the chances of lock contention on cc_pref while the TaskManager runs
if ($controller == 'api') {
$taskManager->runTasks();
}

I'll implement a quick fix, by adding a periodic task in the celery worker that will call the API every 5 minutes.

jooola added a commit that referenced this issue Apr 13, 2024
### Description

This ensures that when the LibreTime interface is not used for a long
time, we still have the task manager run essential tasks for the
schedule.

Related to #2670
@matias-tecnosoul
Copy link
Author

I'll implement a quick fix, by adding a periodic task in the celery worker that will call the API every 5 minutes.

Hi, is it possible that the problem is still happening?
How could I check if the 5 min API call is working?
thanks!

@jooola
Copy link
Contributor

jooola commented May 17, 2024

You can check the nginx logs, and the worker logs.

@matias-tecnosoul
Copy link
Author

I confirm the issue is still happening to us on at least 2 instances running latest git.
I'm trying to find any relevant logs in the mentioned nginx and worker. Are there any particular events I should search for?
thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api This is affecting the api is: bug
Projects
None yet
Development

No branches or pull requests

7 participants