Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Index out of bounds (IndexError) when importing playlist #4069

Open
Primemeow opened this issue Aug 24, 2023 · 23 comments
Open

[Bug] Index out of bounds (IndexError) when importing playlist #4069

Primemeow opened this issue Aug 24, 2023 · 23 comments
Labels
bug Something isn't working module:imports Account data import/export

Comments

@Primemeow
Copy link

Primemeow commented Aug 24, 2023

Describe the bug
I am unable to import any playlists, and this is the given error. I have tried on multiple instances and received the same error.

Steps to Reproduce

  1. Export playlist from Google Takeout
  2. Select the file in /data_control
  3. Click import

Logs
Title: Index out of bounds (IndexError)
Date: 2023-08-24T20:18:37Z
Route: /data_control?referer=%2Fsubscription_manager
Version: 2023.08.07-3450896 @ master

Backtrace

Index out of bounds (IndexError)
  from /usr/share/crystal/src/json/parser.cr:117:5 in 'update_data_control'
  from lib/kemal/src/kemal/route.cr:12:9 in '->'
  from src/invidious/helpers/handlers.cr:30:37 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/filter_handler.cr:21:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from src/ext/kemal_static_file_handler.cr:112:11 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from /usr/share/crystal/src/http/server/handler.cr:28:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'process'
  from /usr/share/crystal/src/http/server.cr:500:5 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???

Screenshots
image

Additional context

  • Browser (if applicable): Firefox 116.0.3
  • OS (if applicable): Windows 11
@Primemeow Primemeow added the bug Something isn't working label Aug 24, 2023
@mattiaudisio

This comment was marked as duplicate.

@unixfox
Copy link
Member

unixfox commented Aug 26, 2023

Check if this solves your issue: #4048 (comment)

@Primemeow
Copy link
Author

Check if this solves your issue: #4048 (comment)

That seems to be an unrelated issue, they were trying to import subscriptions using the playlist dialog and got the error. I’m trying to import playlists through the playlist dialog and I’m getting that error.

@cedriclocqueneux

This comment was marked as duplicate.

@andreschoppe

This comment was marked as duplicate.

@gil-roboute

This comment was marked as duplicate.

@RadioDarrenFM

This comment was marked as duplicate.

@gptlang

This comment was marked as duplicate.

@c13mn14k
Copy link

c13mn14k commented Oct 19, 2023

Same issue, maybe youtube has changed the export csv ? I don't have a way to verify it, but I looked at the code. In

def parse_playlist_export_csv(user : User, raw_input : String)
# Split the input into head and body content
raw_head, raw_body = raw_input.strip('\n').split("\n\n", limit: 2, remove_empty: true)
# Create the playlist from the head content
csv_head = CSV.new(raw_head.strip('\n'), headers: true)
csv_head.next
title = csv_head[4]
description = csv_head[5]
visibility = csv_head[6]

it seems that this function expects these data points to be in the CSV:

  • title,
  • description,
  • visibility data,

which are not inside the csv. The title is in the filename, but other fields are not present anywhere.
My playlist csv is named +-filmy.csv and the two rows are:

Identyfikator filmu,Sygnatura czasowa utworzenia filmu z playlisty
j-qzjKaIZnc,2019-10-11T00:13:53+00:00

Which signifies a discrepancy and possible export csv schema change.

Update: youtube definitely changed the schema. Now the csvs exported are as follows.

For n playlists, youtube gives n+1 files:

  • n csv files named {playlist-title}-videos.csv with schema video id,timestamp when added to playlist
  • 1 csv file named playlist(s).csv with schema id, ... some bs ... ,original playlist title,original playlist title language,timestamp created,timestamp updated,playlist videos order,visibility.

Do note that I am translating titles and schema from polish.

I'll try later today to write a PR for this bug, but I know of Crystal since today, so I may give up during env setup :/

Additionaly, since the schema changed a PR that changes import function is necessary anyway - I think it'll be very useful to implement importing every playlist at once as a directory. Since the data is split in two types of files anyway, users will need to upload at least two files even when importing one playlist and it would be useful to not have to upload these files for each playlist to import. There are two problems still:

Since the files with video ids do not contain playlist id it may be difficult to join the data between the two types of CSVs youtube gives - the only identifiers are playlist title, which is obviously unreliable and may also be escaped since playlist title is part of the filename and playlist creation timestamp which then can be used to search for the CSV file with a video where timestamp when added to playlist matches the timestamp of playlist creation, which should exist and work reliably in most cases I think.

The second problem is that users may not want to upload all playlist, but then they can simply delete the csv files of unwanted playlist.

@DOOMMARINE117

This comment was marked as spam.

@raxod502
Copy link

raxod502 commented Nov 24, 2023

I wrote a small script that can be used as a starting point for transforming the new CSV format into the old format, as a temporary workaround until we can patch Invidious to parse the new format directly. It's a bit of a hack and you'll probably have to fix some things. Also, some of the videos are still missed when importing into Invidious. I'm not sure why. (the latter part was user error)

You give as arguments the path of the playlists subdirectory in Google Takeout, and the path to a new directory that will be completely deleted and replaced with a fixed version of the playlists. You probably should review the script before running it, as with any other unvetted code on the internet.

Script text
#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist Title (Original)"]
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    assert fname.endswith("-videos.csv"), fname
    playlist_name = fname.removesuffix("-videos.csv")
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist Visibility"]
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w") as f:
        f.writelines(line + "\n" for line in lines)

@DOOMMARINE117
Copy link

I wrote a small script that can be used as a starting point for transforming the new CSV format into the old format, as a temporary workaround until we can patch Invidious to parse the new format directly. It's a bit of a hack and you'll probably have to fix some things. Also, some of the videos are still missed when importing into Invidious. I'm not sure why.

Script text

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist Title (Original)"]
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    assert fname.endswith("-videos.csv"), fname
    playlist_name = fname.removesuffix("-videos.csv")
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist Visibility"]
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w") as f:
        f.writelines(line + "\n" for line in lines)

How to add this in? On the webpage?

@raxod502
Copy link

The script is for transforming the Google Takeout CSV files before import.

@DOOMMARINE117
Copy link

The script is for transforming the Google Takeout CSV files before import.

I see, so I add this to a file of text document? Then place this in takeout folder?

@raxod502
Copy link

raxod502 commented Dec 2, 2023

You'll need some Python knowledge to be able to properly use and adapt the script to your use case. I would recommend seeking advice elsewhere, the issue tracker is primarily for technical discussion rather than general support.

@Ajimaru
Copy link

Ajimaru commented Dec 28, 2023

This code only works if the data was exported from Google while the main account language is set to English, and Google again did some changes on the naming etc. I have adjusted the Python Code:

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist title (original)"] # title with lower case t
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    assert fname.endswith(" videos.csv"), fname # - removed
    playlist_name = fname.removesuffix(" videos.csv") # - removed
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist visibility"] # visibility with lower case v
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w") as f:
        f.writelines(line + "\n" for line in lines)

@fkrueger
Copy link

fkrueger commented Jan 7, 2024

Just a quick workaround so people don't need to find this bug report in order to import their playlists again, since it's been 2+ months now.

#4379

edit: Recreated all the sheebang to add a pull request with a feature as per documentation :-)

@d4g
Copy link

d4g commented Jan 17, 2024

Modified the script for the German version and also fixed newline character in windows. If you only need the newline fix:
Replace

    with open(output_dir / f"{playlist_name}.csv", "w") as f:

with

    with open(output_dir / f"{playlist_name}.csv", "w", newline='\n') as f:

Whole script:

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        # playlist_name = record["Playlist title (original)"] # title with lower case t
        playlist_name = record["Playlist-Titel (Original)"] # title with lower case t
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "Playlists.csv":
        continue
    assert fname.endswith("-Videos.csv"), fname # - removed
    playlist_name = fname.removesuffix("-Videos.csv") # - removed
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist-Sichtbarkeit"] # visibility with lower case v
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f",,,,{playlist_name},,{visibility}")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video-ID"].strip()
        lines.append(f"{video_id},,,,,,")
    with open(output_dir / f"{playlist_name}.csv", "w", newline='\n') as f:
        f.writelines(line + "\n" for line in lines)

@jrmain
Copy link

jrmain commented Feb 27, 2024

With a few modifications, I was able to get the script provided above to work with my playlists.

Well, mostly. I found that importing a long list causes the web server to time out with "504 Gateway Time-out". The timeout (on the inv.n8pjl.ca instance) occurs 60 seconds after the import begins. The playlist does get partially imported, creating an Invidious playlist with 256 entries out of the 267 in the source list. Update: I tried a few more, and it's definitely a hard time limit of 60 seconds, and nothing to do with the number of entries in the playlist.

Here's my version (used with Python 3 on Windows):

#!/usr/bin/env python3

import argparse
import csv
import os
import pathlib
import shutil

parser = argparse.ArgumentParser()
parser.add_argument("input_dir")
parser.add_argument("output_dir")
args = parser.parse_args()

input_dir = pathlib.Path(args.input_dir).resolve()
output_dir = pathlib.Path(args.output_dir).resolve()
print(f"Input direcctory: {input_dir}")
print(f"Output direcctory: {output_dir}")

os.chdir(input_dir)

playlists = {}

with open("playlists.csv") as f:
    for record in csv.DictReader(f):
        playlist_name = record["Playlist title (original)"]
        playlists[playlist_name] = record

videos_by_playlist = {}

mangled_names = {
    "Winter _23": "Winter '23",
    "New Year_s Day": "New Year's Day",
    "Music - trippy, chill, ambient, and otherwise m":"Music - trippy, chill, ambient, and otherwise mellow",
}

for fname in os.listdir():
    if not fname.endswith(".csv"):
        continue
    if fname == "playlists.csv":
        continue
    playlist_name = fname.removesuffix(" videos.csv")
    playlist_name = playlist_name.removesuffix(".csv")
    print(f"Playlist name: {playlist_name}")
    playlist_name = mangled_names.get(playlist_name, playlist_name)
    videos = []
    with open(fname) as f:
        for record in csv.DictReader(f):
            videos.append(record)
    playlists[playlist_name]["videos"] = videos

try:
    shutil.rmtree(output_dir)
except FileNotFoundError:
    pass

output_dir.mkdir()

for playlist_name, playlist in playlists.items():
    visibility = playlist["Playlist visibility"]
    header = "Video Id,Unused 1,Unused 2,Unused 3,Title,Description,Visibility"
    lines = []
    lines.append(header)
    lines.append(f""",,,,"{playlist_name}",,{visibility}""")
    lines.append("")
    lines.append(header)
    for video in playlist["videos"]:
        video_id = video["Video ID"].strip()
        lines.append(f"""{video_id},,,,"{playlist_name}",,{visibility}""")
    with open(output_dir / f"{playlist_name}.csv", "w", newline='\n') as f:
        f.writelines(line + "\n" for line in lines)

@ghost
Copy link

ghost commented Mar 29, 2024

I wrote a small script that can be used as a starting point for transforming the new CSV format into the old format, as a temporary workaround until we can patch Invidious to parse the new format directly. It's a bit of a hack and you'll probably have to fix some things. Also, some of the videos are still missed when importing into Invidious. I'm not sure why. (the latter part was user error)
Script text

Serious warning: Do not set the output_dir as ~ or else you will lose (almost) everything in the home folder on Linux. I made my mistakes.

@fkrueger
Copy link

My workaround from the beginning of this year works for the most part.
No "~" problem there ;-)

@Huddeij
Copy link

Huddeij commented Apr 18, 2024

How abourt someone of the devs adjusts the playlist import csv function after over 8 months of this? The python script is great and all, but I think, it is getting past the time, where we still should use a hack to import playlist, isn't it?

Please, update the invidious' playlist import from csv function to the latest csv scheme of googles export automation

@fkrueger
Copy link

Since the workaround I submitted at the beginning of January 2024 was marked as "uncompleted", now as "stale" while commenting on the pull request is unavailable, I wonder..

Just what is the problem with the patch?

I just recompiled and patched it from scratch. Used the current google takeout format (in english!) and it still works as beautifully as before.

Can any of the more knowing people please point me to why the patch is "uncompleted" as the tag says.. and more importantly, what I can do about it?

Thanks! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:imports Account data import/export
Projects
None yet
Development

No branches or pull requests