Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote runner stuck #6282

Closed
emansom opened this issue Mar 21, 2024 · 17 comments · May be fixed by #6318
Closed

Remote runner stuck #6282

emansom opened this issue Mar 21, 2024 · 17 comments · May be fixed by #6318
Labels
Component: Runners Type: Bug 🐛 Confirmed bug, at least replicated once by another contributor

Comments

@emansom
Copy link
Contributor

emansom commented Mar 21, 2024

Describe the current behavior

My PeerTube instance has been configured to remotely transcode everything, however, it gets stuck.

Steps to reproduce

  1. Configure a PeerTube runner and enable remote transcoding in settings
  2. Upload/import a video
  3. ???
  4. Observe it getting stuck

Describe the expected behavior

The runner not getting stuck.

Additional information

  • PeerTube instance:

    • URL: https://pcbu.nl
    • Version:
      • Web: latest release
      • Runner: develop branch
    • NodeJS version:
    • Ffmpeg version:
  • Browser name, version and platforms on which you could reproduce the bug:

  • Link to browser console log if relevant:

  • Link to server log if relevant (journalctl or /var/www/peertube/storage/logs/):

@emansom
Copy link
Contributor Author

emansom commented Mar 21, 2024

I've modified the code of the runner to enable VAAPI hardware acceleration, this may be causing it. Ignore for now until I've isolated the bug further.

@emansom
Copy link
Contributor Author

emansom commented Mar 21, 2024

Let it run overnight when it was stuck. Ctrl-Cing after 8+ hours it spat out an Job is not in "processing" state error miraculously. Then when executing again it resumed. Here are the logs:

[01:37:02.442] INFO (1729443): Checking available jobs on https://pcbu.nl
[01:37:02.443] DEBUG (1729443): Requesting jobs on https://pcbu.nl for runner z97x
[01:37:02.635] INFO (1729443): [https://pcbu.nl] Processing job of type vod-hls-transcoding: 9529dd81-e4af-4c5f-8893-d2bcce2fa194
[01:37:02.637] INFO (1729443): Downloading input file https://pcbu.nl/api/v1/runners/jobs/9529dd81-e4af-4c5f-8893-d2bcce2fa194/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality for HLS transcoding job ptrjt-0dcc43c4-c422-46b3-8b87-aab030592c63
[01:37:02.637] DEBUG (1729443): Downloading file https://pcbu.nl/api/v1/runners/jobs/9529dd81-e4af-4c5f-8893-d2bcce2fa194/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality
[01:47:59.891] INFO (1729443): Downloaded input file https://pcbu.nl/api/v1/runners/jobs/9529dd81-e4af-4c5f-8893-d2bcce2fa194/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality for job ptrjt-0dcc43c4-c422-46b3-8b87-aab030592c63. Running HLS transcoding.
[01:47:59.892] DEBUG (1729443): Will run transcode.
[01:48:00.755] DEBUG (1729443): Apply ffmpeg params from libfdk_aac for audio stream of input /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/8d82c27b-d6b9-4d97-b776-570dc4b4f81b using default profile.
[01:48:00.756] DEBUG (1729443): Apply ffmpeg params from h264_vaapi for video stream of input /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/8d82c27b-d6b9-4d97-b776-570dc4b4f81b using default profile.
[01:48:00.761] DEBUG (1729443): ffmpeg command: ffmpeg -n 20 /usr/bin/ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD128 -i /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/8d82c27b-d6b9-4d97-b776-570dc4b4f81b -y -acodec libfdk_aac -vcodec h264_vaapi -threads 2 -f mp4 -movflags faststart -max_muxing_queue_size 1024 -map_metadata -1 -vbr 5 -filter:a loudnorm=I=-13:LRA=20:TP=-2 -vf hwupload,scale_vaapi=w=-2:h=1080:format=nv12 -maxrate:v 6220877 -bufsize:v 12441754 -b_strategy 1 -bf 16 -qp 18 -r 30 -g:v 60 -hls_time 4 -hls_list_size 0 -hls_playlist_type vod -hls_segment_filename /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/f68005ce-9112-4622-9533-f7b7913c943d-1080-fragmented.mp4 -hls_segment_type fmp4 -f hls -hls_flags single_file /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/f68005ce-9112-4622-9533-f7b7913c943d-1080.m3u8
[02:18:12.175] DEBUG (1729443): FFmpeg command ended.
^C[13:24:37.518] INFO (1729443): Cleaning up after program exit
[13:24:37.580] ERROR (1729443): Expected status 204, got 400. 
The server responded: "Job is not in "processing" state".
You may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
    err: {
      "type": "Error",
      "message": "Expected status 204, got 400. \nThe server responded: \"Job is not in \"processing\" state\".\nYou may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs",
      "stack":
          Error: Expected status 204, got 400. 
          The server responded: "Job is not in "processing" state".
          You may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
              at buildRequest (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:75874:14)
              at makePostBodyRequest (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:75816:10)
              at RunnerJobsCommand.postBodyRequest (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:84785:12)
              at RunnerJobsCommand.abort (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:85268:17)
              at RunnerServer.onExit (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:90635:33)
              at process.<anonymous> (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:90471:20)
              at process.emit (node:events:519:28)
          ----
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:75877:19
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50823:17
              at Test._assertFunction (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50809:17)
              at Test.assert (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50699:27)
              at localAssert (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50667:18)
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50671:11
              at Request3.callback (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50240:7)
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50392:22
              at IncomingMessage.<anonymous> (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:49643:11)
              at IncomingMessage.emit (node:events:531:35)
      "res": {
        "req": {
          "method": "POST",
          "url": "https://pcbu.nl/api/v1/runners/jobs/281aeb0f-a03e-46c4-85e5-80a68d2b4f9e/abort",
          "data": {
            "reason": "Runner stopped",
            "jobToken": "ptrjt-713f7a1f-c7d4-4ed1-b151-bda5acd11c42",
            "runnerToken": "ptrt-5cabc452-26e8-4c8f-9c03-13d81a1da642"
          },
          "headers": {
            "content-type": "application/json"
          }
        },
        "header": {
          "server": "nginx",
          "date": "Thu, 21 Mar 2024 12:24:37 GMT",
          "content-type": "application/problem+json; charset=utf-8",
          "content-length": "262",
          "connection": "close",
          "x-powered-by": "PeerTube",
          "tk": "N",
          "access-control-allow-origin": "*",
          "access-control-allow-credentials": "true",
          "access-control-expose-headers": "Retry-After",
          "x-ratelimit-limit": "50",
          "x-ratelimit-remaining": "48",
          "x-ratelimit-reset": "1711023883",
          "etag": "W/\"106-IgZHWE3xH2agTCXd1tuwbcgGDlQ\"",
          "x-protocol": "HTTP/1.1"
        },
        "status": 400,
        "text": "{\"type\":\"https://docs.joinpeertube.org/api-rest-reference.html#section/Errors/runner_job_not_in_processing_state\",\"detail\":\"Job is not in \\\"processing\\\" state\",\"status\":400,\"code\":\"runner_job_not_in_processing_state\",\"error\":\"Job is not in \\\"processing\\\" state\"}"
      }
    }
13:24 ewout@z97x ~/workspace/personal/PeerTube%                                                                                                                                                                                                       (git)-[feature/ffmpeg-hwaccel]
13:25 ewout@z97x ~/workspace/personal/PeerTube% DRI_PRIME=1 LIBVA_DRIVER_NAME=i965 node apps/peertube-runner/dist/peertube-runner.js --verbose server                                                                                                 (git)-[feature/ffmpeg-hwaccel]
[13:25:15.512] INFO (2098655): Running PeerTube runner in server mode
[13:25:15.516] INFO (2098655): Using /home/ewout/.config/peertube-runner-nodejs/default/config.toml as configuration file
[13:25:15.536] INFO (2098655): Loading registered instance https://pcbu.nl
[13:25:15.540] INFO (2098655): IPC socket created on /home/ewout/.local/share/peertube-runner-nodejs/default/peertube-runner.sock
[13:25:15.640] INFO (2098655): Connected to https://pcbu.nl/runners socket
[13:25:17.205] INFO (2098655): Using /home/ewout/.cache/peertube-runner-nodejs/default/transcoding for transcoding directory
[13:25:17.205] INFO (2098655): Checking available jobs on https://pcbu.nl
[13:25:17.205] DEBUG (2098655): Requesting jobs on https://pcbu.nl for runner z97x
[13:25:17.364] INFO (2098655): [https://pcbu.nl] Processing job of type vod-hls-transcoding: f8daf235-a0ae-4635-9c41-ee6e3799e8dd
[13:25:17.367] INFO (2098655): Downloading input file https://pcbu.nl/api/v1/runners/jobs/f8daf235-a0ae-4635-9c41-ee6e3799e8dd/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality for HLS transcoding job ptrjt-354a2278-924d-4866-b03f-eac2d663d149
[13:25:17.368] DEBUG (2098655): Downloading file https://pcbu.nl/api/v1/runners/jobs/f8daf235-a0ae-4635-9c41-ee6e3799e8dd/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality
[13:25:19.870] INFO (2098655): Checking available jobs on https://pcbu.nl
[13:25:19.871] DEBUG (2098655): Requesting jobs on https://pcbu.nl for runner z97x
[13:25:20.183] INFO (2098655): [https://pcbu.nl] Processing job of type vod-hls-transcoding: 0c9cfddf-2c33-43d0-8323-da6d48db1002
[13:25:20.184] INFO (2098655): Downloading input file https://pcbu.nl/api/v1/runners/jobs/0c9cfddf-2c33-43d0-8323-da6d48db1002/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality for HLS transcoding job ptrjt-fd572096-4f97-410c-8c63-f970ce213419
[13:25:20.184] DEBUG (2098655): Downloading file https://pcbu.nl/api/v1/runners/jobs/0c9cfddf-2c33-43d0-8323-da6d48db1002/files/videos/05fa3c9a-cd71-4e7f-bbff-2fab27b9d96b/max-quality

Screenshot of remote runner jobs list:

Schermafdruk van 2024-03-21 13-32-38

Schermafdruk van 2024-03-21 13-33-52

@emansom
Copy link
Contributor Author

emansom commented Mar 21, 2024

After restarting it now processed the audio-only HLS transcode without a hitch. Clearly it didn't get stuck on the audio-only part. The investigation as to why it got stuck continues.

Schermafdruk van 2024-03-21 13-50-57

@emansom emansom changed the title Remote runner stuck on audio-only HLS transcode Remote runner stuck Mar 21, 2024
@emansom
Copy link
Contributor Author

emansom commented Mar 22, 2024

Okay, after a day debugging it was caused by a Intel VAAPI bug and ffmpeg silently crashing. Switching to Mesa VAAPI (AMD) instantly fixed it. Apologies for false positive.

@emansom emansom closed this as completed Mar 22, 2024
@emansom
Copy link
Contributor Author

emansom commented Mar 25, 2024

Nope, still running into "204 expected, 400 returned" issues and the runner being stuck unable to fetch new remote jobs. The error only shows up when Ctrl-C-ing. Re-executing the runner resolves it and everything continues normally after that.
Schermafdruk van 2024-03-25 04-40-51
Schermafdruk van 2024-03-25 04-38-07

When re-executing again everything resumes as expected:

Schermafdruk van 2024-03-25 04-44-41

@emansom emansom reopened this Mar 25, 2024
@Chocobozzz
Copy link
Owner

Hello,

Can you try with peertube runner 0.0.12?

@Chocobozzz Chocobozzz added the Status: Waiting for answer Waiting issue author answer label Mar 25, 2024
@emansom
Copy link
Contributor Author

emansom commented Mar 25, 2024

Hello,

Can you try with peertube runner 0.0.12?

Instead of develop branch?

@emansom
Copy link
Contributor Author

emansom commented Mar 25, 2024

04:42 ewout@z97x ~/workspace/personal/PeerTube% LIBVA_DRIVER_NAME=radeonsi node apps/peertube-runner/dist/peertube-runner.js --verbose server                                                                                                         (git)-[feature/ffmpeg-hwaccel]
[04:42:50.281] INFO (320428): Running PeerTube runner in server mode
[04:42:50.287] INFO (320428): Using /home/ewout/.config/peertube-runner-nodejs/default/config.toml as configuration file
[04:42:50.309] INFO (320428): Loading registered instance https://pcbu.nl
[04:42:50.314] INFO (320428): IPC socket created on /home/ewout/.local/share/peertube-runner-nodejs/default/peertube-runner.sock
[04:42:50.401] INFO (320428): Connected to https://pcbu.nl/runners socket
[04:42:50.633] INFO (320428): Using /home/ewout/.cache/peertube-runner-nodejs/default/transcoding for transcoding directory
[04:42:50.633] INFO (320428): Checking available jobs on https://pcbu.nl
[04:42:50.634] DEBUG (320428): Requesting jobs on https://pcbu.nl for runner z97x
[04:42:50.784] INFO (320428): [https://pcbu.nl] Processing job of type vod-hls-transcoding: 6255d3d1-fbc1-49da-a5ee-16879de0eb8d
[04:42:50.787] INFO (320428): Downloading input file https://pcbu.nl/api/v1/runners/jobs/6255d3d1-fbc1-49da-a5ee-16879de0eb8d/files/videos/402cb23f-0671-4506-8f00-63b0cc47ea4f/max-quality for HLS transcoding job ptrjt-de1ee6a6-ceee-4b9f-8844-4d16315849b0
[04:42:50.788] DEBUG (320428): Downloading file https://pcbu.nl/api/v1/runners/jobs/6255d3d1-fbc1-49da-a5ee-16879de0eb8d/files/videos/402cb23f-0671-4506-8f00-63b0cc47ea4f/max-quality
[04:48:54.820] INFO (320428): Downloaded input file https://pcbu.nl/api/v1/runners/jobs/6255d3d1-fbc1-49da-a5ee-16879de0eb8d/files/videos/402cb23f-0671-4506-8f00-63b0cc47ea4f/max-quality for job ptrjt-de1ee6a6-ceee-4b9f-8844-4d16315849b0. Running HLS transcoding.
[04:48:54.821] DEBUG (320428): Will run transcode.
[04:48:56.154] INFO (320428): Built supported ffmpeg encoders.
[04:48:56.155] DEBUG (320428): Apply ffmpeg params from libfdk_aac for audio stream of input /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/e5bd5811-afc8-48ff-9da8-bab4cece61be using default profile.
[04:48:56.155] DEBUG (320428): Apply ffmpeg params from h264_vaapi for video stream of input /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/e5bd5811-afc8-48ff-9da8-bab4cece61be using default profile.
[04:48:56.512] DEBUG (320428): ffmpeg command: ffmpeg -n 20 /usr/bin/ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD129 -i /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/e5bd5811-afc8-48ff-9da8-bab4cece61be -y -acodec libfdk_aac -vcodec h264_vaapi -threads 2 -f mp4 -movflags +faststart -max_muxing_queue_size 1024 -map_metadata -1 -vbr 5 -af loudnorm=I=-13:LRA=20:TP=-2 -vf hwupload,scale_vaapi=w=1920:h=1080:mode=hq:format=nv12 -qp 18 -r 30 -g:v 60 -hls_time 4 -hls_list_size 0 -hls_playlist_type vod -hls_segment_filename /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/155fbea0-4c04-4478-ba25-33676a791962-1080-fragmented.mp4 -hls_segment_type fmp4 -f hls -hls_flags single_file /home/ewout/.cache/peertube-runner-nodejs/default/transcoding/155fbea0-4c04-4478-ba25-33676a791962-1080.m3u8
[05:51:46.234] DEBUG (320428): FFmpeg command ended.
^C[14:56:21.026] INFO (320428): Cleaning up after program exit
[14:56:21.111] ERROR (320428): Expected status 204, got 400. 
The server responded: "Job is not in "processing" state".
You may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
    err: {
      "type": "Error",
      "message": "Expected status 204, got 400. \nThe server responded: \"Job is not in \"processing\" state\".\nYou may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs",
      "stack":
          Error: Expected status 204, got 400. 
          The server responded: "Job is not in "processing" state".
          You may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
              at buildRequest (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:75874:14)
              at makePostBodyRequest (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:75816:10)
              at RunnerJobsCommand.postBodyRequest (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:84785:12)
              at RunnerJobsCommand.abort (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:85268:17)
              at RunnerServer.onExit (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:90650:33)
              at process.<anonymous> (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:90486:20)
              at process.emit (node:events:519:28)
          ----
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:75877:19
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50823:17
              at Test._assertFunction (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50809:17)
              at Test.assert (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50699:27)
              at localAssert (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50667:18)
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50671:11
              at Request3.callback (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50240:7)
              at file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:50392:22
              at IncomingMessage.<anonymous> (file:///home/ewout/workspace/personal/PeerTube/apps/peertube-runner/dist/peertube-runner.js:49643:11)
              at IncomingMessage.emit (node:events:531:35)
      "res": {
        "req": {
          "method": "POST",
          "url": "https://pcbu.nl/api/v1/runners/jobs/6255d3d1-fbc1-49da-a5ee-16879de0eb8d/abort",
          "data": {
            "reason": "Runner stopped",
            "jobToken": "ptrjt-de1ee6a6-ceee-4b9f-8844-4d16315849b0",
            "runnerToken": "ptrt-5cabc452-26e8-4c8f-9c03-13d81a1da642"
          },
          "headers": {
            "content-type": "application/json"
          }
        },
        "header": {
          "server": "nginx",
          "date": "Mon, 25 Mar 2024 13:56:21 GMT",
          "content-type": "application/problem+json; charset=utf-8",
          "content-length": "262",
          "connection": "close",
          "x-powered-by": "PeerTube",
          "tk": "N",
          "access-control-allow-origin": "*",
          "access-control-allow-credentials": "true",
          "access-control-expose-headers": "Retry-After",
          "x-ratelimit-limit": "50",
          "x-ratelimit-remaining": "25",
          "x-ratelimit-reset": "1711374985",
          "etag": "W/\"106-IgZHWE3xH2agTCXd1tuwbcgGDlQ\"",
          "x-protocol": "HTTP/1.1"
        },
        "status": 400,
        "text": "{\"type\":\"https://docs.joinpeertube.org/api-rest-reference.html#section/Errors/runner_job_not_in_processing_state\",\"detail\":\"Job is not in \\\"processing\\\" state\",\"status\":400,\"code\":\"runner_job_not_in_processing_state\",\"error\":\"Job is not in \\\"processing\\\" state\"}"
      }
    }
14:56 ewout@z97x ~/workspace/personal/PeerTube%

@Chocobozzz
Copy link
Owner

Can you try with the official runner release, and so without modified ffmpeg command?

@emansom
Copy link
Contributor Author

emansom commented Mar 27, 2024

Can you try with the official runner release, and so without modified ffmpeg command?

I don't see how that'd solve the issue, given the ffmpeg command ends without issue.

From the logs above:

[05:51:46.234] DEBUG (320428): FFmpeg command ended.

I'll try to debug the server more using the instructions specified here.

As this leads me to believe the fault is there and it has something to do with the job scheduling being out of sync:

[14:20:38.894] ERROR (27914): Expected status 204, got 400. 
The server responded: "Job is not in "processing" state".
You may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs
   err: {
     "type": "Error",
     "message": "Expected status 204, got 400. \nThe server responded: \"Job is not in \"processing\" state\".\nYou may take a closer look at the logs. To see how to do so, check out this page: https://github.com/Chocobozzz/PeerTube/blob/develop/support/doc/development/tests.md#debug-server-logs",

@Chocobozzz
Copy link
Owner

I don't see how that'd solve the issue, given the ffmpeg command ends without issue.

I don't know modifications you made so I prefer to try to debug your issue with the official runner first. After the FFmpeg command ended. log the runner should send a request to the server (path ends with /success). Do you see it?

@Chocobozzz
Copy link
Owner

Issue should be fixed by 93b09bf included in runner v0.0.16 :)

@Chocobozzz Chocobozzz added Type: Bug 🐛 Confirmed bug, at least replicated once by another contributor Component: Runners and removed Status: Waiting for answer Waiting issue author answer labels Apr 5, 2024
@vincib
Copy link

vincib commented Apr 7, 2024

I had something that looks a lot like this bug, on my standard (no ffmpeg change) remote-runner running on 0.0.13.

I have 6 runners connected to ~10 peertubes, and the runners get stuck pretty quickly, at least once a day, even when they have nothing to do.

Will look at 0.0.16 and if it's still stuck, i'll open a new bug, thanks Chocobozzz!

@vincib
Copy link

vincib commented Apr 7, 2024

I just updated 3 of my 6 runners to 0.0.16, will tell you if they get stucked again or not ;)

@vincib
Copy link

vincib commented Apr 8, 2024

1 of the 3 non-updated has been stucked :/ looks fixed to me

@emansom
Copy link
Contributor Author

emansom commented Apr 9, 2024

Not fixed, ran into the same behavior as previously posted here (the 204 expected, 400 received error on Ctrl-C and it not fetching new jobs for over 8+ hours).

Tracked the bug down I think in PR #6318.

Hasn't gotten stuck processing a backlog of 18 remotely fetched jobs so far, where as before it got stuck after the first or second remotely fetched job.

Edit: nevermind, got stuck after the second again..

It may be a relevant detail that i'm running the runner with a concurrency of 1 btw.

@emansom
Copy link
Contributor Author

emansom commented Apr 9, 2024

I think I narrowed it down: during debugging I observed the large upload (>3GB) (after transcode/encode) to PeerTube never resolving the result promise. It seems to be a upstream issue as others reported the same thing ladjs/supertest#727.

Now rewriting parts of the code to utilize node-fetch instead to see if that fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Runners Type: Bug 🐛 Confirmed bug, at least replicated once by another contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants