Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wiley journals #442

Open
wanghaosjtu opened this issue Apr 12, 2022 · 1 comment
Open

Wiley journals #442

wanghaosjtu opened this issue Apr 12, 2022 · 1 comment
Labels

Comments

@wanghaosjtu
Copy link

Please confirm the following statements and check the boxes before creating an issue:

  • [ y] I've upgraded cfscrape with pip install -U cfscrape
  • [ y] I'm using Node version 10 or higher
  • [ y] The site protection I'm having issues with is from Cloudflare
  • [ y] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 3.8.5

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: c:\miniconda3\envs\py3\lib\site-packages
Requires: requests

Code snippet involved with the issue

    url = 'https://onlinelibrary.wiley.com/doi/10.1111/jpim.12613'

    import cfscrape
    scraper = cfscrape.create_scraper()
    print(scraper.get(url))
    tokens, user_agent = cfscrape.get_tokens(url)
    cookie_value, user_agent = cfscrape.get_cookie_string(url)

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): onlinelibrary.wiley.com:443
DEBUG:urllib3.connectionpool:https://onlinelibrary.wiley.com:443 "GET /doi/10.1111/jpim.12613 HTTP/1.1" 503 None
Traceback (most recent call last):
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 251, in solve_challenge
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "c:/Users/nuc-002/workspace/calibre/seleniumbrowser.py", line 416, in <module>
  File "c:\miniconda3\envs\py3\lib\site-packages\requests\sessions.py", line 542, in get
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 129, in request
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 290, in solve_challenge
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

[From ego-systems to open innovation ecosystems: A process model of inter-firm openness]

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

@wanghaosjtu
Copy link
Author

this Wiley's cloudflare seems pop out even I use selenium driver to open it.
sometimes it directs to right page, without luck, stuck in that cloudflare page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant