New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main page parser (+ "Creators" are missing) #508
Comments
The main page only lists a few of the credits. Just three cast members, three directors, and so on. That makes it pretty useless to parse the page. Unless there's some data that is only on that page. Parsing the full cast and crew page might make more sense in your example. But I have to check. |
I only need a few; I'm creating a local movie database with only the most important data, so I don't need a full cast. The main page is convenient as it prioritizes the main cast members (stars) better and more accurately and doesn't have duplicates. If you take a closer look, the order of cast members on the main page is different from the order on the Also, as I mentioned, "Creators" are now mostly missing from the |
Yes, I noticed that the cast order is different. The new cinemagoerng package already has a parser for the main page but it uses the JSON data in the page and is not ready to port to cinemagoer yet. |
Wow, very cool! It seems the "Stars" section is always three items long (regardless of the plot, e.g. "Cast Away" vs "Friends"), so it's enough to use |
The only thing missing are "Creators" on TV series main pages. I see that you already parse a partial list of writers for movies, so this would be a great addition. Also, additional crew information is scraped only for movies. |
I'll look into parsing the creators. It shouldn't be too hard. Additional crew should be parsed for any type if the data is available. If you have an example where the title has additional crew data which doesn't get parsed, let me know. |
I tried to get writers and directors for the first ten TV series from this list, and non of them work. I checked the |
Could this be the difference between the episode additional crew and the series additional crew? |
This part is done. |
Indeed so. Metadata of episodes is fetched just fine. |
Issue description
It seems that the "Creators" information is missing from the majority of
/reference
pages; some pages that previously had it no longer do. Even if it is available (for instance, on "Black Mirror"), the same people are duplicated. The "Writers" entries are also duplicated. The "Stars" section often differs from the same section on the main page, which is more accurate (see "Kill Bill: Vol. 1"). "Directors" are mostly fine.Would it be possible to create a main page parser for these pieces of information?
The text was updated successfully, but these errors were encountered: