Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring duplicate input URLs, keeping deleted YouTube comments, checking video availability #9976

Closed
6 of 9 tasks
cow1337killer3 opened this issue May 20, 2024 · 1 comment
Labels
question Question

Comments

@cow1337killer3
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Please make sure the question is worded well enough to be understood

I have 3 questions and figured it'd be easier to post them all in 1 thread.

For context, I wrote a Node program for archiving music from YouTube, which has currently saved about 30K songs. I have a few maxed out 5K playlists which I regularly download to get any newly added videos, and to also retry downloading any that were previously privated/blocked/deleted. I also check the availability/status of all the archived videos to see which ones have been deleted and could possibly be re-uploaded to YouTube.

  1. To check the availability of videos, I'm currently making HEAD web requests for the thumbnails and seeing if it gives a 404. This currently takes about 10 or 20 minutes for 30K videos, the only problem is that I can't tell whether a 404 video is privated, blocked, or deleted. The yt-dlp --simulate option seems to be slower and having to parse the console output doesn't seem ideal. Is there any other way using yt-dlp?
  2. Is there any way to make yt-dlp ignore duplicate input URLs? My playlists contain a lot of privated/blocked/deleted videos, and my program records these from the playlists, along with any that were queued as individual videos. What ends up happening is that a lot of those videos are both in the playlists as well as queued individually, so yt-dlp will attempt to download the same video multiple times in the same session.
  3. Is there any built-in way to update the metadata JSON files without completely overwriting them? Specifically, I don't want comments deleted from YouTube to be erased from my previously downloaded JSON files. This wouldn't be too difficult to write in Node, but figured I'd ask first just in case there's a built-in way that I don't know about. I'm still not sure what exactly --load-info-json does.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

No response

@cow1337killer3 cow1337killer3 added the question Question label May 20, 2024
@cow1337killer3 cow1337killer3 changed the title Ignoring duplicate input URLs, keeing deleted YouTube comments, checking video availability Ignoring duplicate input URLs, keeping deleted YouTube comments, checking video availability May 20, 2024
@bashonly
Copy link
Member

bashonly commented May 22, 2024

I don't quite grasp exactly what your program is doing, so here are some ballpark suggestions/answers:

  1. If they are playlists, then use --flat-playlist ?

  2. You could try using --download-archive. Perhaps --force-write-archive or mock writing to the archive file with --print-to-file "youtube %(id)s" archive.txt would be of use to you depending on exactly what you're doing

  3. No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question
Projects
None yet
Development

No branches or pull requests

2 participants