Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GUIDE] Bypass amazon detection each 5000 tries with change user agent method #11

Open
pnthai88 opened this issue Oct 2, 2019 · 3 comments
Labels

Comments

@pnthai88
Copy link

pnthai88 commented Oct 2, 2019

Dear guys,

Thanks for sharing your code - Author, philipperemy. It's helpful for my data science hobby atm.
Here is how to bypass detection of amazon

### In: core_utils.py:
Import fake user agent

def get_soup_retry(url):
    from fake_useragent import UserAgent
    ua = UserAgent()
    UserAGR = ua.random
    if AMAZON_BASE_URL not in url:
        url = AMAZON_BASE_URL + url
    nap_time_sec = 1
    logging.debug('Script is going to sleep for {} (Amazon throttling). ZZZzzzZZZzz.'.format(nap_time_sec))
    sleep(nap_time_sec)
    
    header = {
        'User-Agent': UserAGR
    }
    logging.debug('-> to Amazon : {}'.format(url))
    isCaptcha = True
    while isCaptcha==True:
        out = requests.get(url, headers=header)
        assert out.status_code == 200
        soup = BeautifulSoup(out.content, 'lxml')
        if 'captcha' in str(soup):
            UserAGR = ua.random
            print('Bot has been detected... retrying ... use new identity: ', UserAGR)
            isCaptcha=True
        else:
            UserAGR = ua.random
            print('Bot bypassed')
            isCaptcha=False
            return soup


def get_soup(url):
    soup = get_soup_retry(url)
    return soup

Well it's simply go through with many tries :)
Good luck!

@pnthai88 pnthai88 changed the title Bypass amazon detection each 5000 tries [GUIDE] Bypass amazon detection each 5000 tries with change user agent method Oct 2, 2019
@philipperemy
Copy link
Owner

Excellent! Happy it could work out well for you. I'm using ExpressVPN when it happens but it requires a subscription. Nice trick!

@stefantrinh1
Copy link

I have tried this but seem to be getting

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)

and

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>

and

raise FakeUserAgentError('Maximum amount of retries reached')

fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

@philipperemy
Copy link
Owner

@stefantrinh1 hum it does not sounds good. Check your internet connection and that everything is working properly. Run on python3, re-install your deps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants