Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

captcha error 405 Not Allowed #481

Open
toppk opened this issue Sep 2, 2023 · 3 comments
Open

captcha error 405 Not Allowed #481

toppk opened this issue Sep 2, 2023 · 3 comments
Labels
bug http parsers of IMDb web pages

Comments

@toppk
Copy link
Contributor

toppk commented Sep 2, 2023

Issue description

Getting 405 captcha failure

Version of Cinemagoer, Python and OS

NOTICE: please always try the latest version from the repository before submitting a bug.

  • Python: Python 3.11.5
  • Cinemagoer: 2023.05.01
  • OS: uname_result(system='Linux', release='6.3.12-200.fc38.x86_64', version='Add sql indexes #1 SMP PREEMPT_DYNAMIC Fri Jul 7 01:00:51 EDT 2023', machine='x86_64')

Steps to reproduce the issue

Additional details

I'm getting captcha failures. oddly, 90% of my cinemagoer queries work, but sometimes, I get a 405. When I try the same url in a private browser, it walks me through a javascript heavy captcha.

{'errcode': None, 'errmsg': 'None', 'url': 'https://www.imdb.com/find/?q=The+Purge+%282013%29+++++&s=tt', 'proxy': '', 'exception type': 'IOError', 'original exception': <HTTPError 405: 'Not Allowed'>}

I think i need to use the browser cookie extraction code from yt-dlp, and patch that into parser.http.

I did try setting user-agent to see if that would avoid triggering the captcha test, but that didn't work.

for reference this was the monkey patch I tried:

import imdb.parser.http
def replace(old_ret):
   def  new_ret(self, *args, **kwargs):
       self.set_header("User-Agent", "Mozilla/5.0 (X11; Linux i686; rv:107.0) Gecko/20100101 Firefox/107.0")
       return old_ret(self, *args, **kwargs)
   return new_ret

old_ret = imdb.parser.http.IMDbURLopener.retrieve_unicode
imdb.parser.http.IMDbURLopener.retrieve_unicode = replace(old_ret)
@alberanid
Copy link
Collaborator

thanks for the report.

Were you trying multiple requests in a short amount of time? This seems like a restriction to keep bots at bay.

Honestly, I don't think we can do much to fix it.

@alberanid alberanid added bug http parsers of IMDb web pages labels Oct 22, 2023
@jfadams1963
Copy link

jfadams1963 commented Oct 22, 2023

If this is an 'anti-bot' measure, (seems likely), then one option is to use a scraper proxy. I solved a similar problem this way, and find it works quite well.

You can get a free personal use account at https://scrapeops.io/. Using the proxy, your requests seem to come from different IPs, each with a randomly chosen user-agent.

An example of how that may look using the requests library:

import requests
BASE_URL = "https://www.imdb.com/title/tt0133093/mediaindex/"
data = requests.get(
  url='https://proxy.scrapeops.io/v1/',
  params={
      'api_key': 'your-api-key',
      'url': BASE_URL, 
  }, timeout=10

I hope this has been helpful. :-)

@stbo2
Copy link

stbo2 commented Feb 6, 2024

I got around this issue. When I did a massive set of searches, seven got rejected with error 405. When I tried the same searches, even separately, I got error again. I got around this by slightly changing the search, title vis title (year). The altered searches worked. Hereafter, I will use smaller batches to (hopefully) avoid the problem.

        movies=list(ia.search_movie(title+'('+str(year)+')'))

changed to
movies=list(ia.search_movie(title))

Kludgy, but it got me around the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug http parsers of IMDb web pages
Projects
None yet
Development

No branches or pull requests

4 participants