Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare protection handling can recurse infinitely with some user-agents #1490

Closed
YiIdirim opened this issue Dec 21, 2020 · 12 comments · Fixed by #1505
Closed

Cloudflare protection handling can recurse infinitely with some user-agents #1490

YiIdirim opened this issue Dec 21, 2020 · 12 comments · Fixed by #1505

Comments

@YiIdirim
Copy link
Contributor

YiIdirim commented Dec 21, 2020

Expected Behavior

Stores with Cloudfare protection should be able to wait and correctly load the webpage to capture the stock information.

Current Behavior

Stores such as CCL (UK) do not work with the current implementation of DDoS protection handling due to the use of the top-user-agents module. It stays on the same product and shows as CLOUDFARE, WAITING

I did some digging and found that the reason was due to the use of unconventional user agents being utilised by the randomiser and therefore any Cloudfare stores do not progress onto the actual intended webpage.

Steps to Reproduce

  1. Scan for any products such as RTX 3060 Ti at CCL (UK) store
  2. Notice that it will stay stuck on one card and show CLOUDFARE, WAITING endlessly
@neatchee
Copy link
Contributor

Title is misleading; @jef Recommend renaming title to "Cloudflare protection handling can recurse infinitely with some user-agents"

@jef jef closed this as completed in #1505 Dec 24, 2020
@jef jef changed the title Cloudfare protection handling does not work with top-user-agents Cloudflare protection handling can recurse infinitely with some user-agents Dec 24, 2020
@DeeJayhX
Copy link

@jef We should re-open this issue. It still exists:
image

Config, not that it should matter:

PLAY_SOUND=ohyea.mp3
SHOW_ONLY_SERIES=3080
MAX_PRICE_SERIES_3080=1100
STORES=amazon,bandh,bestbuy,newegg,zotac
SHOW_ONLY_MODELS=founders edition, strix, rog strix, tuf, xlr8 revel, ftw3, ftw3 ultra, trinity, xc3 black, xc3, tuf oc, amp holo, ventus 3x oc, vision oc, gaming x trio, strix white, trinity oc, suprim x, eagle, xc3 ultra, aorus, aorus xtreme, ventus 3x, eagle oc, strix oc, aorus master,
DISCORD_WEB_HOOK=redacted

@DeeJayhX
Copy link

To clarify, I guess the recursion issue has "technically" been resolved, so maybe we don't need to re-open this issue after all. However, there is absolutely zero chance of ever properly scraping Zotac, because it says cloudflare, waiting every single time.

@neatchee
Copy link
Contributor

neatchee commented Jan 31, 2021 via email

@DeeJayhX
Copy link

DeeJayhX commented Jan 31, 2021

you are just hitting zotac too frequently and getting blocked.

What frequency are you hitting them?

Also, all my amazon hits are coming up as CAPTCHA. Same problem?

I have them all set to the defaults, by the way.

@DeeJayhX
Copy link

This means you've been identified as a bot and are being blocked. There is nothing wrong with the code, you are just hitting zotac too frequently and getting blocked. I am still successfully running this code and getting results from zotac.

On Sun, Jan 31, 2021, 11:03 AM DeeJayhX @.***> wrote: To clarify, I guess the recursion issue has "technically" been resolved, so maybe we don't need to re-open this issue after all. However, there is absolutely zero chance of ever properly scraping Zotac, because it says cloudflare, waiting every single time. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1490 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABA3WH3GGGVSSEIWFWAEAKLS4WSRHANCNFSM4VEYAVNQ .

PAGE_BACKOFF_MIN=60000
PAGE_SLEEP_MIN=10000
PAGE_SLEEP_MAX=12022

Even with more relaxed settings, I am still getting cloudflare timeouts for Zotac. The amazon CAPTCHA seems to have cleared up. What settings do you use?

@neatchee
Copy link
Contributor

neatchee commented Feb 1, 2021 via email

@DeeJayhX
Copy link

DeeJayhX commented Feb 2, 2021

If you've already been flagged and blocked then you will not be able to hit their site again without cycling your IP address. Have you changed IP address since you were first sent into the infinite cloudflare loop?

No. I'm able to access their site just fine and view the stock myself from the very same IP address.

@neatchee
Copy link
Contributor

neatchee commented Feb 2, 2021 via email

@DeeJayhX
Copy link

DeeJayhX commented Feb 2, 2021

That's not the same. Puppeteer (which we use for automating the browser) leaves some "fingerprints" so they can and will block those access attempts while allowing your normal browser. Please try getting a new IP address and see if you have better results.
-- Brian "Neatchee" Resnik

  1. I got these errors the moment I fired up the bot. I never got scrapes from Zotac. Not 1.
  2. The same happens on 10's of different IP addresses when accessing it over private, paid VPNs.

@DeeJayhX
Copy link

DeeJayhX commented Feb 2, 2021

Here's an attempt with multiple IP address changes (pretty much a new IP between each failure. I guess Zotac, of all places, must just have the best bot detection on the planet.
image

@neatchee
Copy link
Contributor

neatchee commented Feb 2, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants