Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken entries in CSV file #2

Open
Mo-Gul opened this issue Aug 17, 2021 · 1 comment
Open

Broken entries in CSV file #2

Mo-Gul opened this issue Aug 17, 2021 · 1 comment

Comments

@Mo-Gul
Copy link
Contributor

Mo-Gul commented Aug 17, 2021

As the title already states there seems to be something "broken" in the CSV file. For example almost at the end of the CSV file (see screenshot). There are two empty names followed by the name "nas" which most likely is also wrong/not complete. Do you know what (full) names were listed there or shall these entries be deleted?

image showing the last few entries of the CSV file

There are more of these entries, see lines 6325, 9458-9459, 9733, 9735, 13957, 34840.

@Mo-Gul Mo-Gul changed the title Almost at the end of the CSV file there seems to be something broken Broken entries near the end of CSV file Aug 17, 2021
@Mo-Gul Mo-Gul changed the title Broken entries near the end of CSV file Broken entries in CSV file Aug 17, 2021
@MatthiasWinkelmann
Copy link
Owner

There were some issues with encoding in the file I started with, as it was created in pre-Unicode times and used different encodings for each line. I believe I got a lot of it fixed at some point, but it’s likely I missed some, especially for character sets with few names.

I‘ll try to do some archeology. As the dataset is seeing some use, I might even try to find more recent data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants