Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Thumbnails not downloaded, curl error 404, when scraping subscriptions from a file #669

Open
kevenwyld opened this issue Apr 4, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@kevenwyld
Copy link
Contributor

Describe the bug

When running ytfzf -t -f -c SI thumbnails are not downloaded and some curl 404 errors go to stderr:

] > ytfzf --thumbnail-log=log.txt -t -f -c SI
Scraping subscriptions with instance: https://invidious.esmailelbob.xyz
DL% UL%  Dled  Uled  Xfers  Live Total     Current  Left    Speed
--  --  1150k     0    18     0  --:--:--  0:00:05 --:--:--  212k
Fetching thumbnails...
DL% UL%  Dled  Uled  Xfers  Live Total     Current  Left    Speed
--  --      0     0    36    36  --:--:--  0:00:03 --:--:--     0      curl: (22) The requested URL returned error: 404
--  --      0     0    36    35  --:--:--  0:00:06 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    32  --:--:--  0:00:07 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    17  --:--:--  0:00:09 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    15  --:--:--  0:00:11 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    11  --:--:--  0:00:11 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36     6  --:--:--  0:00:12 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36     4  --:--:--  0:00:13 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
  0 --      0     0    36     0  --:--:--  0:00:14 --:--:--     0

To Reproduce

run ytfzf -t -f -c SI with the following subscriptions file:

https://www.youtube.com/channel/UC4PIO2pZaFKzI97uumFTNSg/videos # OficineRobotica
https://www.youtube.com/channel/UCeKpbMimEGgLM_0tnghfoVw/videos # Clough42
https://www.youtube.com/channel/UC7pokUsRb6q2B0FOzSqQLlw/videos # Adventures in creation
https://www.youtube.com/channel/UCw3UZn1tcVe7pH3R6C3Gcng/videos # Abom79
https://www.youtube.com/channel/UCCkSr3M8GXbS4txqPY7OMxQ/videos # Edge Precision
https://www.youtube.com/channel/UC7Jf7t6BL4e74O53dL6arSw/videos # Blondihacks
https://www.youtube.com/channel/UCY8gSLTqvs38bR9X061jFWw/videos # Stefan Gotteswinter
https://www.youtube.com/channel/UC-CubOaooNwC-3RBKUoAOQQ/videos # Joko Engineeringhelp
https://www.youtube.com/channel/UC7aAyIrjeH2RKciAXzdOaJA/videos # Artisan Makes
https://www.youtube.com/channel/UCKLIIdKEpjAnn8E76KP7sQg/videos # mrpete222
https://www.youtube.com/channel/UCyjwQ6oz4cqqtEcWGboSU3g/videos # Keith Rucker - VintageMachinery.org
https://www.youtube.com/channel/UChIs72whgZI9w6d6FhwGGHA/videos # Gamers Nexus
https://www.youtube.com/channel/UCVI8Mfisni3GaobL1e2JOIQ/videos # Inheritance Machining
https://www.youtube.com/channel/UC2wdo5vU7bPBNzyC2nnwmNQ/videos # Cutting Edge Engineering Australia
https://www.youtube.com/channel/UCworsKCR-Sx6R6-BnIjS2MA/videos # Clickspring
https://www.youtube.com/channel/UC9UjDtkpr2I-5G51vMJZvnA/videos # ClickspringClips
https://www.youtube.com/channel/UCiDJtJKMICpb9B1qf7qjEOA/videos # Adam Savage’s Tested
https://www.youtube.com/channel/UCB0wPMJJ2FKqdB-gx7YVsDg/videos # Matty’s Workshop

Expected behavior

Thumbnails similar to those displayed when using the invidious-channel feature

Screenshots

Screenshot_2023-04-04_10-51-34

Information

  • OS: Archlinux
  • Terminal: alacritty
  • Ytfzf version: ytfzf: 2.5.5 (from aur ytfzf-git r1963.ac4cc79-1)
  • Output of ls -l "$(which sh)" (if you're using fish: ls -l (which sh)): lrwxrwxrwx 1 root root 4 Jan 8 2022 /usr/sbin/sh -> bash*
  • (if is a thumbnail issue) run ytfzf --thumbnail-log=log.txt and post the file: The file is empty

Additional context

I did some testing using bash -x to get debug output. Here's download log output from a working invidious-channel scrape:

+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/i6WIRWdGUPg/hqdefault.jpg i6WIRWdGUPg
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/J2zZhThFurg/hqdefault.jpg J2zZhThFurg
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/B4u8MpH9db8/hqdefault.jpg B4u8MpH9db8
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/nIDQzpBLqFo/hqdefault.jpg nIDQzpBLqFo
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/0LbDxvvA8Ww/hqdefault.jpg 0LbDxvvA8Ww
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/kQOF9cB7Gjw/hqdefault.jpg kQOF9cB7Gjw
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/RlHteM78lDo/hqdefault.jpg RlHteM78lDo

and here's one from a not working -cSI scrape against my subscriptions file:

+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/eCDW3Xm_voE/high.jpg eCDW3Xm_voE
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/x6LUpi6W3YA/high.jpg x6LUpi6W3YA
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/jX9jzSfVrUA/high.jpg jX9jzSfVrUA
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/1qtg1z5V1ss/high.jpg 1qtg1z5V1ss
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/AXdFQga0i88/high.jpg AXdFQga0i88
+ curl -fLZ -K /tmp/ytfzf-1000/SCRAPE-SI-1100859/tmp/curl_config

I tried downloading the image from both. https://invidious.baczek.me/vi/1qtg1z5V1ss/hqdefault.jpg contains an image while https://invidious.baczek.me/vi/1qtg1z5V1ss/high.jpg does not. Though I cant figure out why the two types of scrapes request different quality images. I think this may not be related though since neither URL is a 404. I hope this is helpful though.

This is the only place thumbnails are broken for me. They work with all other searches and scrapes.

Thanks!

@kevenwyld kevenwyld added the bug Something isn't working label Apr 4, 2023
@Euro20179
Copy link
Collaborator

You could try using --thumbnail-quality=hqdefault, however using high works for me.

@kevenwyld
Copy link
Contributor Author

Could it be that this was a misunderstanding of the supported thumbnail types in invidious? There is no high url, but the name of the high thumbnail is hqdefault here: https://github.com/iv-org/invidious/blob/6837e4292829ee0891c73108096b806b63ab1506/src/invidious/videos.cr#L425

I've tried every instance I can find and none of them return anything for https://<instance_url>/vi/AKZRuNZDkGU/high.jpg but they all return an image for https://<instance_url>/vi/AKZRuNZDkGU/hqdefault.jpg

This makes me think the default quality should be hqdefault instead of high. But I'll gladly admit that I don't have a complete understanding of this codebase and could be completely wrong =] .

Also I can reproduce this very consistently with ytfzf --thumbnail-quality=high -t -f -c SI

@Euro20179
Copy link
Collaborator

tbh, high works for me 99% of the time, if this becomes a bigger issue i'll change the default to hqdefault. In the meantime, i'd suggest added thumbnail_quality=hqdefault to your config file.

edit:
Im kinda dumb, I didn't realize this only really affects subscriptions for some reason, and when scraping SI this bug appears a lot more often for me.

@kevenwyld
Copy link
Contributor Author

kevenwyld commented Apr 4, 2023

I think it's because scrape_SI... or maybe scrape_subscriptions doesn't call _get_invidious_thumb_quality_name but the other functions like scrape_invidious_playlist do?

You are converting high to hqdefault in that function but without it the thumbnail_quality variable is just high which is what's being passed to invidious in the url as far as I can tell.

_get_invidious_thumb_quality_name () {
    case "$thumbnail_quality" in
        high) thumbnail_quality="hqdefault" ;;
        medium) thumbnail_quality="mqdefault" ;;
        start) thumbnail_quality="1" ;;
        middle) thumbnail_quality="2" ;;
        end) thumbnail_quality="3" ;;
    esac
}

PS. I have no idea how you stay organized in a 3565 line long file.... And also sorry if I'm way off here.

EDIT: I tried adding _get_invidious_thumb_quality_name to the scrape_SI function and it seems to have fixed it. Though not sure if that's the best solution.

@Euro20179
Copy link
Collaborator

I have no idea how you stay organized in a 3565 line long file

Its hard lol.

I think it's because scrape_SI... or maybe scrape_subscriptions doesn't call _get_invidious_thumb_quality_name

I think you're right. I will add this patch when I get home. I believe adding the function call is the best solution, but it might be better if it gets called automatically somewhere.

@Euro20179
Copy link
Collaborator

This should now be fixed in the development branch.

@kevenwyld
Copy link
Contributor Author

Thanks! Just tested and it's working great now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants