Movie datasets quotes, goofs, and faqs return as empty #495

jfadams1963 · 2023-12-10T18:24:09Z

Issue description

Movie datasets quotes, goofs, and faqs return as empty

Version of Cinemagoer, Python and OS

OS:
Ubuntu 22.04.3, Linux 6.1.57, and
FreeBSD 14
Python version:
3.12.0 and 3.9.18
Cinemagoer version:
2023.10.22

Steps to reproduce the issue

In [1]: from imdb import (Cinemagoer,
...: IMDbError,
...: IMDbDataAccessError)

In [2]: ia = Cinemagoer()

In [3]: movies = ia.search_movie('the matrix')

In [4]: mat = movies[0]

In [5]: mat['title']
Out[5]: 'The Matrix'

In [6]: mid = mat.movieID

In [7]: mv = ia.get_movie(mid, info=['main', 'quotes', 'goofs', 'faqs'])

In [9]: mv.get('quotes', [])
Out[9]: []

In [10]: mv.get('goofs', [])
Out[10]: []

In [11]: mv.get('faqs', [])
Out[11]: []

What's the expected result?

Expected result was populated data sets.

What's the actual result?

Actual result was empty data sets, which can be seen on the website.

The text was updated successfully, but these errors were encountered:

robertobalestri · 2024-02-21T10:00:10Z

Same here, can't get it working

jfadams1963 · 2024-02-21T17:44:09Z

OK, so it's not just me! I hadn't checked in a ~month.
Also, to clarify my last statement above: the information that is missing from the returned datasets is viewable on imdb.com. So, it's not that the information isn't in their database, but rather, Cinemagoer can't get to it as currently implemented.

robertobalestri · 2024-02-21T17:53:20Z

I made a little scraping script that fetch the first 50 quotes (not more cause it needs selenium to load more than 50 because now Imdb load them dinamically and we can't load dynamic content through Beautiful Soul) Inviato da Outlook per Android<https://aka.ms/AAb9ysg>

…

________________________________ From: J Adams ***@***.***> Sent: Wednesday, February 21, 2024 6:44:20 PM To: cinemagoer/cinemagoer ***@***.***> Cc: Roberto Balestri ***@***.***>; Comment ***@***.***> Subject: Re: [cinemagoer/cinemagoer] Movie datasets quotes, goofs, and faqs return as empty (Issue #495) OK, so it's not just me! I hadn't checked in a ~month. Also, to clarify my last statement above: the information that is missing from the returned datasets is viewable on imdb.com. So, it's not that the information isn't in their database, but rather, Cinemagoer can't get to it as currently implemented. — Reply to this email directly, view it on GitHub<#495 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BD3CATZGLKUIE7PLAYUDOWLYUYW7JAVCNFSM6AAAAABAOXYO5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXGQ3TSOJZGM>. You are receiving this because you commented.Message ID: ***@***.***>

jfadams1963 · 2024-02-21T17:55:30Z

Just upgraded to the latest snapshot and there is no change. I would be nice to see some action on this; this problem must be common to all Cinemagoer users.

robertobalestri · 2024-02-21T18:14:46Z

anyway, just for info this is the lil script for scraping:

def find_quotes():

ia = Cinemagoer()
    

id = "11111"

movie = ia.get_movie(id)

url = ia.get_imdbURL(movie)

# URL to be scraped
url = f'{url}quotes/'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
html_content = response.text

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Find elements by the class name 'ipc-list-card--border-line'
quote_elements = soup.find_all(class_='ipc-list-card--border-line')

quotes_list = []

for element in quote_elements:
    full_quote = ""
    # Find all 'li' elements but ignore 'a' tags within them
    quotes = element.find_all('li')
    for quote in quotes:
        # Remove 'a' tags to exclude speaker names
        for a in quote.find_all('a'):
            a.decompose()
        # Concatenate the text, preserving dialogue only
        full_quote += " " + quote.get_text(strip=True)
        
    # Clean and add the processed quote to the list
    full_quote = full_quote.replace(":", "").strip()
    quotes_list.append(full_quote)
    
return quotes

jfadams1963 · 2024-02-23T15:13:53Z

Hi Roberto. I just tried your function out and it works fine. Thanks @robertobalestri ! (Too bad we've had to resort to scraping.) Just for fun this weekend, I'm going to modify this to get "Goofs" and "FAQs" as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Movie datasets quotes, goofs, and faqs return as empty #495

Movie datasets quotes, goofs, and faqs return as empty #495

jfadams1963 commented Dec 10, 2023 •

edited

robertobalestri commented Feb 21, 2024

jfadams1963 commented Feb 21, 2024

robertobalestri commented Feb 21, 2024 via email

jfadams1963 commented Feb 21, 2024

robertobalestri commented Feb 21, 2024

jfadams1963 commented Feb 23, 2024

Movie datasets quotes, goofs, and faqs return as empty #495

Movie datasets quotes, goofs, and faqs return as empty #495

Comments

jfadams1963 commented Dec 10, 2023 • edited

Issue description

Version of Cinemagoer, Python and OS

Steps to reproduce the issue

What's the expected result?

What's the actual result?

robertobalestri commented Feb 21, 2024

jfadams1963 commented Feb 21, 2024

robertobalestri commented Feb 21, 2024 via email

jfadams1963 commented Feb 21, 2024

robertobalestri commented Feb 21, 2024

jfadams1963 commented Feb 23, 2024

jfadams1963 commented Dec 10, 2023 •

edited