captcha error 405 Not Allowed #481

toppk · 2023-09-02T00:42:11Z

Issue description

Getting 405 captcha failure

Version of Cinemagoer, Python and OS

NOTICE: please always try the latest version from the repository before submitting a bug.

Python: Python 3.11.5
Cinemagoer: 2023.05.01
OS: uname_result(system='Linux', release='6.3.12-200.fc38.x86_64', version='Add sql indexes #1 SMP PREEMPT_DYNAMIC Fri Jul 7 01:00:51 EDT 2023', machine='x86_64')

Steps to reproduce the issue

Additional details

I'm getting captcha failures. oddly, 90% of my cinemagoer queries work, but sometimes, I get a 405. When I try the same url in a private browser, it walks me through a javascript heavy captcha.

{'errcode': None, 'errmsg': 'None', 'url': 'https://www.imdb.com/find/?q=The+Purge+%282013%29+++++&s=tt', 'proxy': '', 'exception type': 'IOError', 'original exception': <HTTPError 405: 'Not Allowed'>}

I think i need to use the browser cookie extraction code from yt-dlp, and patch that into parser.http.

I did try setting user-agent to see if that would avoid triggering the captcha test, but that didn't work.

for reference this was the monkey patch I tried:

import imdb.parser.http
def replace(old_ret):
   def  new_ret(self, *args, **kwargs):
       self.set_header("User-Agent", "Mozilla/5.0 (X11; Linux i686; rv:107.0) Gecko/20100101 Firefox/107.0")
       return old_ret(self, *args, **kwargs)
   return new_ret

old_ret = imdb.parser.http.IMDbURLopener.retrieve_unicode
imdb.parser.http.IMDbURLopener.retrieve_unicode = replace(old_ret)

alberanid · 2023-10-22T11:27:53Z

thanks for the report.

Were you trying multiple requests in a short amount of time? This seems like a restriction to keep bots at bay.

Honestly, I don't think we can do much to fix it.

jfadams1963 · 2023-10-22T12:40:26Z

If this is an 'anti-bot' measure, (seems likely), then one option is to use a scraper proxy. I solved a similar problem this way, and find it works quite well.

You can get a free personal use account at https://scrapeops.io/. Using the proxy, your requests seem to come from different IPs, each with a randomly chosen user-agent.

An example of how that may look using the requests library:

import requests
BASE_URL = "https://www.imdb.com/title/tt0133093/mediaindex/"
data = requests.get(
  url='https://proxy.scrapeops.io/v1/',
  params={
      'api_key': 'your-api-key',
      'url': BASE_URL, 
  }, timeout=10

I hope this has been helpful. :-)

stbo2 · 2024-02-06T04:03:26Z

I got around this issue. When I did a massive set of searches, seven got rejected with error 405. When I tried the same searches, even separately, I got error again. I got around this by slightly changing the search, title vis title (year). The altered searches worked. Hereafter, I will use smaller batches to (hopefully) avoid the problem.

        movies=list(ia.search_movie(title+'('+str(year)+')'))

changed to
movies=list(ia.search_movie(title))

Kludgy, but it got me around the problem.

alberanid added bug http parsers of IMDb web pages labels Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

captcha error 405 Not Allowed #481

captcha error 405 Not Allowed #481

toppk commented Sep 2, 2023

alberanid commented Oct 22, 2023

jfadams1963 commented Oct 22, 2023 •

edited

stbo2 commented Feb 6, 2024

captcha error 405 Not Allowed #481

captcha error 405 Not Allowed #481

Comments

toppk commented Sep 2, 2023

Issue description

Version of Cinemagoer, Python and OS

Steps to reproduce the issue

Additional details

alberanid commented Oct 22, 2023

jfadams1963 commented Oct 22, 2023 • edited

stbo2 commented Feb 6, 2024

jfadams1963 commented Oct 22, 2023 •

edited