Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main page parser (+ "Creators" are missing) #508

Open
Susexe opened this issue Apr 2, 2024 · 10 comments
Open

Main page parser (+ "Creators" are missing) #508

Susexe opened this issue Apr 2, 2024 · 10 comments

Comments

@Susexe
Copy link

Susexe commented Apr 2, 2024

Issue description

It seems that the "Creators" information is missing from the majority of /reference pages; some pages that previously had it no longer do. Even if it is available (for instance, on "Black Mirror"), the same people are duplicated. The "Writers" entries are also duplicated. The "Stars" section often differs from the same section on the main page, which is more accurate (see "Kill Bill: Vol. 1"). "Directors" are mostly fine.

Would it be possible to create a main page parser for these pieces of information?

@uyar
Copy link
Collaborator

uyar commented Apr 2, 2024

The main page only lists a few of the credits. Just three cast members, three directors, and so on. That makes it pretty useless to parse the page. Unless there's some data that is only on that page. Parsing the full cast and crew page might make more sense in your example. But I have to check.

@Susexe
Copy link
Author

Susexe commented Apr 2, 2024

I only need a few; I'm creating a local movie database with only the most important data, so I don't need a full cast. The main page is convenient as it prioritizes the main cast members (stars) better and more accurately and doesn't have duplicates.

If you take a closer look, the order of cast members on the main page is different from the order on the /reference page, so the "Stars" section on both pages differs too. To use "Kill Bill: Vol. 1" as an example, David Carradine (Bill) and Daryl Hanna (Elle Driver) from the main page "Stars" section are definitely more prominent than Lucy Liu (O-Ren Ishii) or Vivica A. Fox (Vernita Green) from the same section on the /reference page.

Also, as I mentioned, "Creators" are now mostly missing from the /reference page.

@uyar
Copy link
Collaborator

uyar commented Apr 2, 2024

Yes, I noticed that the cast order is different. The new cinemagoerng package already has a parser for the main page but it uses the JSON data in the page and is not ready to port to cinemagoer yet.

@Susexe
Copy link
Author

Susexe commented Apr 2, 2024

Wow, very cool! It seems the "Stars" section is always three items long (regardless of the plot, e.g. "Cast Away" vs "Friends"), so it's enough to use cast and strip it. Thanks for your fork!

@Susexe
Copy link
Author

Susexe commented Apr 3, 2024

The only thing missing are "Creators" on TV series main pages. I see that you already parse a partial list of writers for movies, so this would be a great addition. Also, additional crew information is scraped only for movies.

@uyar
Copy link
Collaborator

uyar commented Apr 4, 2024

I'll look into parsing the creators. It shouldn't be too hard. Additional crew should be parsed for any type if the data is available. If you have an example where the title has additional crew data which doesn't get parsed, let me know.

@Susexe
Copy link
Author

Susexe commented Apr 4, 2024

I tried to get writers and directors for the first ten TV series from this list, and non of them work. I checked the movie object, but all properties related to additional crew members are empty. I used your sample code from the README file. The same code works just fine for movies.

@uyar
Copy link
Collaborator

uyar commented Apr 10, 2024

Could this be the difference between the episode additional crew and the series additional crew?

@uyar
Copy link
Collaborator

uyar commented Apr 10, 2024

I'll look into parsing the creators. It shouldn't be too hard.

This part is done.

@Susexe
Copy link
Author

Susexe commented Apr 11, 2024

Could this be the difference between the episode additional crew and the series additional crew?

Indeed so. Metadata of episodes is fetched just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants