Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multiple sites] Error solving the challenge. Timeout after X seconds - challenge loop #1036

Open
4 tasks done
MrTyton opened this issue Jan 12, 2024 · 142 comments
Open
4 tasks done
Labels

Comments

@MrTyton
Copy link

MrTyton commented Jan 12, 2024

Have you checked our README?

  • I have checked the README

Have you followed our Troubleshooting?

  • I have followed your Troubleshooting

Is there already an issue for your problem?

  • I have checked older issues, open and closed

Have you checked the discussions?

  • I have read the Discussions

Environment

- FlareSolverr version: 3.3.13
- Last working FlareSolverr version: Unsure, but was working on Monday (2024/01/08)
- Operating system: Docker Unraid
- Are you using Docker: yes
- FlareSolverr User-Agent (see log traces or / endpoint): Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
- Are you using a VPN: no
- Are you using a Proxy: no
- Are you using Captcha Solver: No
- If using captcha solver, which one:
- URL to test this issue: https://www.fanfiction.net/s/14145272/1/In-Your-Wildest-Dreams

Description

Using FanFicFare to scrape from fanfiction.net. Nothing's changed with my config, but it stopped working this week.

Logged Error Messages

01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     FlareSolverr 3.3.13
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Testing web browser installation...
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Platform: Linux-5.19.17-Unraid-x86_64-with-glibc2.31
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Chrome / Chromium path: /usr/bin/chromium
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Chrome / Chromium major version: 120
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Launching web browser...
01/11/2024
07:35:52 PM
version_main cannot be converted to an integer
01/11/2024
07:35:53 PM
2024-01-12 00:35:53 INFO     FlareSolverr User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
01/11/2024
07:35:53 PM
2024-01-12 00:35:53 INFO     Test successful!
01/11/2024
07:35:53 PM
2024-01-12 00:35:53 INFO     Serving on http://0.0.0.0:8191
01/11/2024
07:35:59 PM
2024-01-12 00:35:59 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14145272/1/In-Your-Wildest-Dreams', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/11/2024
07:35:59 PM
version_main cannot be converted to an integer
01/11/2024
07:35:59 PM
2024-01-12 00:35:59 INFO     Challenge detected. Title found: Just a moment...
01/11/2024
07:37:04 PM
2024-01-12 00:37:04 ERROR    Error: Error solving the challenge. Timeout after 65.0 seconds.
01/11/2024
07:37:04 PM
2024-01-12 00:37:04 INFO     Response in 65.723 s
01/11/2024
07:37:04 PM
2024-01-12 00:37:04 INFO     172.17.0.1 POST http://192.168.1.161:8191/v1 500 Internal Server Error

Screenshots

No response

@ilike2burnthing
Copy link
Contributor

Debug logs and headless=false both confirm that the challenge is found, box ticked, page refreshed, but the challenge just returns. Tested on both Windows and Docker.

This was the same behaviour seen with yggtorrent, which was resolved by adding the ENV LANG and using an English language code, however I've tried several language codes to no success.

If anyone has any ideas, or it's working for anyone, let me know.

@nilsherzig
Copy link

I have the same issue on multiple other sites, doesn't look like a site specific thing

@rebootder
Copy link

3.3.9-3.3.13
I am also an infinite loop

@rscm
Copy link

rscm commented Jan 14, 2024

I have the same issue on another totally different site. I had to remove the call from the script because there was no challenge. The script went clean for now. I'll try later an older version.

I'm running it on a VM in Proxmox alongside other docker apps like sonarr, radarr, etc

2024-01-13 21:12:50 INFO     ReqId 139902543808320 FlareSolverr 3.3.13
2024-01-13 21:12:50 DEBUG    ReqId 139902543808320 Debug log enabled
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Testing web browser installation...
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Platform: Linux-6.1.0-17-amd64-x86_64-with-glibc2.31
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Chrome / Chromium path: /usr/bin/chromium
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Chrome / Chromium major version: 120
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Launching web browser...
2024-01-13 21:12:50 DEBUG    ReqId 139902543808320 Launching web browser...
version_main cannot be converted to an integer
2024-01-13 21:12:50 DEBUG    ReqId 139902543808320 Started executable: `/app/chromedriver` in a child process with pid: 31
2024-01-13 21:12:51 INFO     ReqId 139902543808320 FlareSolverr User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
2024-01-13 21:12:51 INFO     ReqId 139902543808320 Test successful!
2024-01-13 21:12:51 INFO     ReqId 139902543808320 Serving on http://0.0.0.0:8191
2024-01-13 21:13:01 INFO     ReqId 139902511249152 Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://xxx.yyy', 'maxTimeout': 60000}
2024-01-13 21:13:01 DEBUG    ReqId 139902511249152 Launching web browser...
version_main cannot be converted to an integer
2024-01-13 21:13:02 DEBUG    ReqId 139902511249152 Started executable: `/app/chromedriver` in a child process with pid: 163
2024-01-13 21:13:02 DEBUG    ReqId 139902511249152 New instance of webdriver has been created to perform the request
2024-01-13 21:13:02 DEBUG    ReqId 139902477678336 Navigating to... https://xxx.yyy
2024-01-13 21:14:02 DEBUG    ReqId 139902511249152 A used instance of webdriver has been destroyed
2024-01-13 21:14:02 ERROR    ReqId 139902511249152 Error: Error solving the challenge. Timeout after 60.0 seconds.
2024-01-13 21:14:02 DEBUG    ReqId 139902511249152 Response => POST /v1 body: {'status': 'error', 'message': 'Error: Error solving the challenge. Timeout after 60.0 seconds.', 'startTimestamp': 1705191181995, 'endTimestamp': 1705191242739, 'version': '3.3.13'}
2024-01-13 21:14:02 INFO     ReqId 139902511249152 Response in 60.744 s
2024-01-13 21:14:02 INFO     ReqId 139902511249152 172.19.0.1 POST http://docker.lan:8191/v1 500 Internal Server Error

@jaaywags

This comment was marked as duplicate.

@ilike2burnthing ilike2burnthing changed the title Failing on requests to fanfiction.net [multiple sites] challenge loop Jan 17, 2024
@ilike2burnthing ilike2burnthing pinned this issue Jan 17, 2024
@DHuckaby
Copy link

I think the issue might be related to using sessions. I previously was using them and in general it worked, but for some sites it would fail after a few requests in a timeout. Switching to a standard cache of cookies and returning them in the get request solved it for me. This probably is very situational and does add more processing time I would imagine since I am spinning up more headless instances, but it worked for me.

@jaaywags
Copy link

Switching to a standard cache of cookies and returning them in the get request solved it for me.

How do you do this? Sorry if that is a dumb question.

@DHuckaby
Copy link

DHuckaby commented Jan 24, 2024

Switching to a standard cache of cookies and returning them in the get request solved it for me.

How do you do this? Sorry if that is a dumb question.

Cache the cookies from FlareSolverr and then send them back in your new requests.

@rubenni
Copy link

rubenni commented Jan 25, 2024

Switching to a standard cache of cookies and returning them in the get request solved it for me.

How do you do this? Sorry if that is a dumb question.

Cache the cookies from FlareSolverr and then send them back in your new requests.

Hi @DHuckaby, would you mind sharing an example on how to do this?

@rubenni
Copy link

rubenni commented Jan 25, 2024

Hi @ilike2burnthing, what I just found out is that it can take a few seconds to load the "verify I am a human box", even when using a regular browser. I guess it's checking the IP address validity, before showing the challenge. In my case, it only finds the challenge very occasionally. Therefore, is it a possibility to add a (configurable) timeout that awaits for the challenge to appear on the page? Or maybe let it check multiple times if the button is displayed on the page (referring to this line in the code ) ?

@ilike2burnthing
Copy link
Contributor

FlareSolverr already does this. Enable debug logging and you'll see it cycling through the check multiple times.

@DHuckaby
Copy link

DHuckaby commented Jan 25, 2024

Hi @DHuckaby, would you mind sharing an example on how to do this?

# Copy of existing Python example on README
import requests

url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
    "cmd": "request.get",
    "url": "http://www.google.com/",
    "maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

# Extract cookies from solution response if successful
cookies = []
json_response = response.json()
if json_response["status"] == "ok":
    cookies = json_response["solution"]["cookies"]

# New request with previous request's cookies
response2 = requests.post(url, headers=headers, json=data, cookies=cookies)
print(response2.text)

@MrTyton
Copy link
Author

MrTyton commented Jan 30, 2024

That doesn't seem to be working for me -

01/30/2024
05:41:12 PM
2024-01-30 22:41:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:41:12 PM
version_main cannot be converted to an integer
01/30/2024
05:41:15 PM
2024-01-30 22:41:15 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:12 PM
2024-01-30 22:42:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:42:12 PM
version_main cannot be converted to an integer
01/30/2024
05:42:17 PM
2024-01-30 22:42:17 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 ERROR    Error: Error solving the challenge. Timeout after 65.0 seconds.
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     Response in 65.662 s
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     xxx.xxx.x.xxx POST http://xxx.xxx.x.xxx/v1 500 Internal Server Error

At least for fanfiction.net, when I'm just trying to do the initial request to get a cookie.

@mintertale
Copy link

That doesn't seem to be working for me -

01/30/2024
05:41:12 PM
2024-01-30 22:41:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:41:12 PM
version_main cannot be converted to an integer
01/30/2024
05:41:15 PM
2024-01-30 22:41:15 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:12 PM
2024-01-30 22:42:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:42:12 PM
version_main cannot be converted to an integer
01/30/2024
05:42:17 PM
2024-01-30 22:42:17 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 ERROR    Error: Error solving the challenge. Timeout after 65.0 seconds.
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     Response in 65.662 s
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     xxx.xxx.x.xxx POST http://xxx.xxx.x.xxx/v1 500 Internal Server Error

At least for fanfiction.net, when I'm just trying to do the initial request to get a cookie.

agree, I have same problem

@Gallardo26
Copy link

Gallardo26 commented Jan 31, 2024

I'm not sure if this issue is related, but I have face similar issues somewhere else...

On the android app for reading manga, Tachiyomi (currently stopped development but there's many forks including Mihon, SY, J2K etc...), I often face cloudflare issue for the source i'm reading. Will have to open a build-in browser then manually solve the cloudflare.

Some sources can be solved manually with the build-in browser, however, sources like Happymh has very strict cloudflare, and we have to change the user-agent in the app so that cloudflare would not get the challenge loop. Perhaps playing with different user-agent could help? Currently I've set to:

Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36

Also, in another development Tachidesk, due to it's nature of running as a server, it does not have a "native browser", so we could not solve the cloudflare manually, and recently they've added Flaresolverr. But folks over there said Flaresolverr doesn't have a function to change it's user-agent (I'm not sure...), so the challenge loop also occurs.

I wish I could code (only understand very very basic coding) to help. And I hope this could help the communities if it does solve the issue everyone is facing here.

@ilike2burnthing
Copy link
Contributor

user-agent header isn't supported, hasn't been since v2, over 2yrs ago.

@Gallardo26
Copy link

So could the user-agent be the issue for the cloudflare challenge loop?

@ilike2burnthing
Copy link
Contributor

Possibly, but I can't check.

@mintertale
Copy link

mintertale commented Feb 1, 2024

Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36

I confirm, I added the user-agent and it worked again

options.add_argument('--ignore-ssl-errors')

Just add after line before:
options.add_argument('--user-agent=Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36')

@Gallardo26
Copy link

Sweet... So shall we add this feature back? and also allow a var in the config to change the user-agent?

@ilike2burnthing
Copy link
Contributor

ilike2burnthing commented Feb 1, 2024

While the ability to use an ENV to achieve this could be added, previously it was part of both FlareSolverr and FlareSolverrSharp, and could be used by indexers which required cookie and UA login. I'll have a look later, but I doubt I'll be able to recreate this. PRs welcome.

@Gallardo26
Copy link

I'm current using the unraid version. Is it possible to just add a ENV VAR and set the value to it? What should the VAR be?

@ilike2burnthing
Copy link
Contributor

Edited comment above to clarify. No, an ENV cannot currently be used.

@ptmplop
Copy link

ptmplop commented May 2, 2024

Started working for me this morning with no changes, must be magic :)

Latest release.

@MrTyton
Copy link
Author

MrTyton commented May 2, 2024 via email

@nathnathn
Copy link

nathnathn commented May 2, 2024

i don't know the code so i don't know if its possible but could a popup full browser window for you to manually respond in response to failure let you generate working cookies as a workaround?
might just let you get away with getting it working for at least as long as the cookies last.
obviously only a temp fix but if it worked might be a good failover option.
edit - just tested with FF to be safe and its still not working for me too

@jaaywags

This comment was marked as off-topic.

@nathnathn

This comment was marked as off-topic.

@CyberPoison
Copy link

CyberPoison commented May 2, 2024

This docker image work on my end, on k8s setup:
#1163 (comment) 👀

Setup an LANG=fr-FR worked nice for yggtorrent :)
no vpn or proxy at this time.

@kaithar
Copy link

kaithar commented May 2, 2024

So... take this with a grain of salt, cause it's not exactly the most scientific test I've ever done...

I was annoyed to the point where I made a couple of tweaks to my docker instance:

  1. I forced headless mode off... for some reason the headless param to UC was being set True and adjusting the earlier call to force it resulted in weirdness (for some reason beyond my comprehension, start_xvfb_display() is only called if get_config_headless() returns True, which then gets passed to uc.Chrome() resulting only headless being possible?) thus I get:
    utils.py: windows_headless=windows_headless, headless=False)
  2. I commented out the option options.add_argument("--auto-open-devtools-for-tabs") because some sites will freak out if you have devtools open and you don't really need this argument.
  3. For the sake of passing, I commented these out since none of them were really an issue for me and I wanted to eliminate them:
# this option removes the zygote sandbox (it seems that the resolution is a bit faster)
#options.add_argument('--no-zygote')
# attempt to fix Docker ARM32 build
#options.add_argument('--disable-gpu-sandbox')
#options.add_argument('--disable-software-rasterizer')
#options.add_argument('--ignore-certificate-errors')
#options.add_argument('--ignore-ssl-errors')
# fix GL errors in ASUSTOR NAS
# https://github.com/FlareSolverr/FlareSolverr/issues/782
# https://github.com/microsoft/vscode/issues/127800#issuecomment-873342069
# https://peter.sh/experiments/chromium-command-line-switches/#use-gl
#options.add_argument('--use-gl=swiftshader')
  1. I set lang to en-GB, forced a chrome v124 on linux UA... I did not fix my timezone issue... at the time I was VPNing via France but my timezone was UTC. I also the driver.start_session() # required to bypass Cloudflare lines since I suspected they weren't helping.
  2. Still not enough though... I added correct params to xvfb: XVFB_DISPLAY = Xvfb(width=1920, height=1080, colordepth=16) and installed x11vnc so I could see what it was actually doing...

Getting this far, the browser bot checkers only really flag the timezone discrepancy, my real browser seems to have more sus results.

At which point I made an odd discovery... if I requested the page it span up the Chrome instance, navigate to the page and promptly went into the auth loop (where it clicked the cf button and the page that loads is back to the cf page instead of the site)... if, in the same window, I quickly open a new tab, manually navigate to the same url, and click the checkbox... yeah, that succeeds first time, which the other tab then picking up the cookies and the rest of the solverr sequence completing normally.

Since this should, logically at least, mean the environment or browser itself isn't the reason for detection and that leaves me suspecting one or more of three potential reasons:

  1. They have tweaked the code in such a way that the checkbox that needs to be clicked is added to the page as a hidden element and there's a delay before it becomes visible. If the box is clicked in that time, seeing the invisible object outs us.
  2. They've added or adjusted timing heuristics that are detecting that the box is being clicked too quickly, faster than a human would normally do so.
  3. They've added a sort of Soft-Fail where, when the heuristics aren't sure or note a minor issue, the page navigates to retest you but uses the navigation History api to check if it's in a loop and fails regardless if you've been looped too many times.

One other thing I notice, though it might be a Herring of the Red variety, is that while trying to coax the loop into giving me the protected page I was getting the notification for blocked Third-Party Cookies around when the page navigated. I didn't dig into finding which third party though, so it could be CF, the site behind, or one of the many external bloat sorry, "entirely useful services that don't want to invade my privacy"... so yeah, fishy? No idea, line snapped before it broke the surface.

I can't do a proper dig around right now, not got free slots on my todo list and have resorted to just baby sitting the browser process via vnc for now.

@Diudid

This comment was marked as duplicate.

@Jojont54
Copy link

Jojont54 commented May 3, 2024

Hello,
If this version solves the problem and doesn't impact others website, is it possible to release it as 3.3.18?
I don't know whether this process is long.
Thank you everybody working on this topic :)

@forkyyy

This comment was marked as spam.

@ilike2burnthing
Copy link
Contributor

More work needs to be done with the current PR and/or the fork by 21hsmw. Read my comments on the PR for more info.

@Diudid
Copy link

Diudid commented May 3, 2024

Half work, still got a an error during the workday. i will plug the output the log on a file tomorrow and try to grab the errors. increased my timeout to 180 from 60 too

@DAMIOSKIDEV

This comment was marked as duplicate.

@TayZ3r TayZ3r mentioned this issue May 4, 2024
4 tasks
@Jonathall

This comment was marked as duplicate.

@PrzemekSkw

This comment was marked as duplicate.

@tifo71

This comment was marked as duplicate.

@tifo71
Copy link

tifo71 commented May 5, 2024

Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36

I confirm, I added the user-agent and it worked again

options.add_argument('--ignore-ssl-errors')

Just add after line before: options.add_argument('--user-agent=Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36')

Great how do i make the change ? can you make a fix ?

@gaetanb49

This comment was marked as duplicate.

@ilike2burnthing ilike2burnthing mentioned this issue May 5, 2024
4 tasks
@tifo71

This comment was marked as duplicate.

@ilike2burnthing
Copy link
Contributor

ilike2burnthing commented May 5, 2024

Due to 'same here' and 'when will this be fixed' spam, this issue is now locked. Read back on previous comments (you should be doing that anyway) if you want to know more and about the current PR.

New PRs or constructive contributions to current PRs are always welcomed.

Opening new issues to try to circumvent this will result in a ban. Commenting on other issues or PRs to try to circumvent this will result in a ban.

Opening new issues has now been restricted to what GH deems 'existing users', due to the number of new accounts (particularly those using YGGtorrent) refusing to read old issues before posting, despite ticking a box saying they definitely did so.

@FlareSolverr FlareSolverr locked as spam and limited conversation to collaborators May 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests