Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Movie datasets quotes, goofs, and faqs return as empty #495

Open
jfadams1963 opened this issue Dec 10, 2023 · 6 comments
Open

Movie datasets quotes, goofs, and faqs return as empty #495

jfadams1963 opened this issue Dec 10, 2023 · 6 comments

Comments

@jfadams1963
Copy link

jfadams1963 commented Dec 10, 2023

Issue description

Movie datasets quotes, goofs, and faqs return as empty

Version of Cinemagoer, Python and OS

OS:
Ubuntu 22.04.3, Linux 6.1.57, and
FreeBSD 14
Python version:
3.12.0 and 3.9.18
Cinemagoer version:
2023.10.22

Steps to reproduce the issue

In [1]: from imdb import (Cinemagoer,
...: IMDbError,
...: IMDbDataAccessError)

In [2]: ia = Cinemagoer()

In [3]: movies = ia.search_movie('the matrix')

In [4]: mat = movies[0]

In [5]: mat['title']
Out[5]: 'The Matrix'

In [6]: mid = mat.movieID

In [7]: mv = ia.get_movie(mid, info=['main', 'quotes', 'goofs', 'faqs'])

In [9]: mv.get('quotes', [])
Out[9]: []

In [10]: mv.get('goofs', [])
Out[10]: []

In [11]: mv.get('faqs', [])
Out[11]: []

What's the expected result?

  • Expected result was populated data sets.

What's the actual result?

  • Actual result was empty data sets, which can be seen on the website.
@robertobalestri
Copy link

Same here, can't get it working

@jfadams1963
Copy link
Author

OK, so it's not just me! I hadn't checked in a ~month.
Also, to clarify my last statement above: the information that is missing from the returned datasets is viewable on imdb.com. So, it's not that the information isn't in their database, but rather, Cinemagoer can't get to it as currently implemented.

@robertobalestri
Copy link

robertobalestri commented Feb 21, 2024 via email

@jfadams1963
Copy link
Author

Just upgraded to the latest snapshot and there is no change. I would be nice to see some action on this; this problem must be common to all Cinemagoer users.

@robertobalestri
Copy link

anyway, just for info this is the lil script for scraping:

def find_quotes():

ia = Cinemagoer()
    

id = "11111"

movie = ia.get_movie(id)

url = ia.get_imdbURL(movie)

# URL to be scraped
url = f'{url}quotes/'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
html_content = response.text

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Find elements by the class name 'ipc-list-card--border-line'
quote_elements = soup.find_all(class_='ipc-list-card--border-line')

quotes_list = []

for element in quote_elements:
    full_quote = ""
    # Find all 'li' elements but ignore 'a' tags within them
    quotes = element.find_all('li')
    for quote in quotes:
        # Remove 'a' tags to exclude speaker names
        for a in quote.find_all('a'):
            a.decompose()
        # Concatenate the text, preserving dialogue only
        full_quote += " " + quote.get_text(strip=True)
        
    # Clean and add the processed quote to the list
    full_quote = full_quote.replace(":", "").strip()
    quotes_list.append(full_quote)
    
return quotes

@jfadams1963
Copy link
Author

Hi Roberto. I just tried your function out and it works fine. Thanks @robertobalestri ! (Too bad we've had to resort to scraping.) Just for fun this weekend, I'm going to modify this to get "Goofs" and "FAQs" as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants