Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VBA-Web Stripping ending slash "/" and not following redirects #466

Open
rsd opened this issue Oct 23, 2021 · 11 comments
Open

VBA-Web Stripping ending slash "/" and not following redirects #466

rsd opened this issue Oct 23, 2021 · 11 comments

Comments

@rsd
Copy link

rsd commented Oct 23, 2021

A very simple client:

    Set Client = New WebClient
    Client.BaseUrl = C_URL                  
    Client.FollowRedirects = True
    
    Set Request = New WebRequest
    Set Response = Client.Execute(Request)

Response.StatusCode will return 302.

The problem is that VBA-Web is stripping the last character -> /
and the server rejects the request redirecting it to the correct url with an / in the end.

This was checked with WebHelpers.EnableLogging = True.

The Client.FollowRedirects did not comply, but this could be an failsafe to avoid an infinite loop request the same url over and over (supposing it would [re]strip the / from the Location header).

One important note is that a workaround for this bug was to add Request.Resource = " ".

If added Request.Resource = "/" the problem would continue.

@zgrose
Copy link

zgrose commented Oct 24, 2021

What server is expecting trailing slashes for resources? Usually endpoints are setup as http://foo.com/posts to GET the posts collection.

@zgrose
Copy link

zgrose commented Oct 25, 2021

I tried to reproduce your issue and I'm not seeing the trailing slash removed. In PrepareHttpRequest, the call Me.GetFullUrl(request) seems to send back whatever I put in.

Public Sub test()

    Dim client As WebClient, request As WebRequest, response As New WebResponse
    
    Set client = New WebClient
    client.BaseUrl = "https://jsonplaceholder.typicode.com"
    client.FollowRedirects = True
    
    WebHelpers.EnableLogging = True
    
    Set request = New WebRequest
    request.Resource = "todos/1/"
    
    Set response = client.Execute(request)
    
End Sub
--> Request - 9:11:58 AM
GET https://jsonplaceholder.typicode.com/todos/1/
User-Agent: VBA-Web v4.1.1 (https://github.com/VBA-tools/VBA-Web)
Content-Type: application/json
Accept: application/json
Content-Length: 0

<-- Response - 9:11:58 AM
200 OK
Cache-Control: max-age=43200
Connection: keep-alive
Date: Mon, 25 Oct 2021 14:11:57 GMT
Pragma: no-cache
Via: 1.1 vegur
Content-Length: 83
Content-Type: application/json; charset=utf-8
Expires: -1
Accept-Ranges: bytes
ETag: W/"53-hfEnumeNh6YirfjyjaujcOPPT+s"
Server: cloudflare
Vary: Origin, Accept-Encoding
x-powered-by: Express
x-ratelimit-limit: 1000
x-ratelimit-remaining: 999
x-ratelimit-reset: 1635171154
access-control-allow-credentials: true
x-content-type-options: nosniff
CF-Cache-Status: MISS
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=gOcVxu9aFeHLlp4rKdI%2BHJ8BevREIAdN1R34RJVlSsazN52sM1WbRnmruEFcfBdlVvRAc3aB0%2BXw1fn3Uu%2FsoJRjnHng3X7uSk7Dr9yYuAIrtMR%2BvnkcTjiSzfXV6nQU%2FRaX0ED8O1HF3kf45H3W"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
CF-RAY: 6a3c097c6b92675b-DFW
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400

{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

@rsd
Copy link
Author

rsd commented Oct 30, 2021

Did you try the url I posted??

Just note that the server redirects to the correct url, even with Client.FollowRedirects = True it wont redirect (maybe this is a curl-only config?).

--> Request - 11:11:45
GET https://www.b3.com.br/en_us/market-data-and-indices/data-services/market-data/reports/equities/options/authorized-series
User-Agent: VBA-Web v4.1.6 (https://github.com/VBA-tools/VBA-Web)
Accept: */*
Accept-Encoding: identity

<-- Response - 11:11:46
302 Found
Cache-Control: max-age=2592000
Connection: Keep-Alive
Date: Sat, 30 Oct 2021 14:11:46 GMT
Keep-Alive: timeout=100, max=500
Content-Length: 0
Content-Language: en-US
Expires: Mon, 29 Nov 2021 14:11:46 GMT
Location: http://www.b3.com.br/en_us/market-data-and-indices/data-services/market-data/reports/equities/options/authorized-series/
Set-Cookie: lumClientId=8AE490C97CBFE996017CD18ACC696D55; Expires=Fri, 30-Oct-2071 14:11:46 GMT; Path=/
Set-Cookie: JSESSIONID=74B082B598840AB36527EB3E53C0D7FF.lumcor00202p; Path=/; HttpOnly
Set-Cookie: lumUserSessionId=M0VkErlP0aYXBb-SQvnMo4T8IElxru5I; Path=/; HttpOnly
Set-Cookie: lumUserName=Guest; Path=/
Set-Cookie: lumIsLoggedUser=false; Path=/
Set-Cookie: lumUserLocale=pt_BR; Path=/
Set-Cookie: lumUserLocale=en_US; Path=/
Set-Cookie: dtCookie=v_4_srv_27_sn_6E18FAA34C667B8ABC3374E807F9EC20_perc_100000_ol_0_mul_1_app-3Afd69ce40c52bd20e_0_rcs-3Acss_0; Path=/; Domain=.b3.com.br
Set-Cookie: BIGipServerpool_www.b3.com.br=1192518666.20480.0000; path=/; Httponly
Set-Cookie: TS0171d45d=011d592ce1bcdf3f8566a07a03459e47718f4dfdd0bf60943a759e4202bae2abf1dfc16c88b6083ca2e7ba248cb56698ebcc7a041f; Path=/; Domain=.b3.com.br; HTTPOnly
Vary: User-Agent
X-OneAgent-JS-Injection: true
X-XSS-Protection: 1; mode=block
X-Frame-Options: ALLOW-FROM http://bvmf.bmfbovespa.com.br, http://www2.bmfbovespa.com.br, http://www.bmf.com.br, http://www2.bmf.com.br, https://www2.cetip.com.br, http://estatisticas.cetip.com.br, https://sistemasweb.b3.com.br/
X-Content-Type-Options: nosniff
Server-Timing: dtSInfo;desc="0", dtRpid;desc="161401642"
Cookie: lumClientId=8AE490C97CBFE996017CD18ACC696D55
Cookie: JSESSIONID=74B082B598840AB36527EB3E53C0D7FF.lumcor00202p
Cookie: lumUserSessionId=M0VkErlP0aYXBb-SQvnMo4T8IElxru5I
Cookie: lumUserName=Guest
Cookie: lumIsLoggedUser=false
Cookie: lumUserLocale=pt_BR
Cookie: lumUserLocale=en_US
Cookie: dtCookie=v_4_srv_27_sn_6E18FAA34C667B8ABC3374E807F9EC20_perc_100000_ol_0_mul_1_app-3Afd69ce40c52bd20e_0_rcs-3Acss_0
Cookie: BIGipServerpool_www.b3.com.br=1192518666.20480.0000
Cookie: TS0171d45d=011d592ce1bcdf3f8566a07a03459e47718f4dfdd0bf60943a759e4202bae2abf1dfc16c88b6083ca2e7ba248cb56698ebcc7a041f

@zgrose
Copy link

zgrose commented Oct 30, 2021

I didn't try your URL, but I looked at GetFullUrl(request) and it doesn't strip any slashes (at least not on Windows). I couldn't see anywhere in the code where slashes are removed but maybe it's a side-effect of the fact that you're doing some kind HTML scraping instead of a JSON request?

@rsd
Copy link
Author

rsd commented Oct 30, 2021

I didn't try your URL, but I looked at GetFullUrl(request) and it doesn't strip any slashes (at least not on Windows). I couldn't see anywhere in the code where slashes are removed but maybe it's a side-effect of the fact that you're doing some kind HTML scraping instead of a JSON request?

Possible.
But how that is different in the GET phase? (Why works in one URL and not the other?)
And why is it not following Redirects?

@zgrose
Copy link

zgrose commented Oct 30, 2021

I think you'll just have to set breakpoints and look. I don't think the maintainer is very active these days. AFAIK, this project was meant for RESTful APIs and not web scraping so you may be better served with an alternate project? Maybe someone else can jump in with some more actionable advice, but I think breakpoints will answer 97.5% of your questions. :)

@davebray131
Copy link

This is an old post, but I came across a similar problem where when joining the base url with the resources when the resources were just a query string a slash was being added. Instead of https://abc.com?something=1 it would create https://abc.com/?something=1 which would result in a redirect that would not get followed resulting in an error. I looked at the code and found that the WebHelpers.JoinUrl is where it removes the trailing slash and the beginning slash from the left side and right side and then joins them together with a slash if both sides are non-blank. I added a test in there to leave out the slash if the right side started with a question mark indicating that it was only a query string and not a resource path.

@huche6
Copy link

huche6 commented Apr 24, 2023

Thank you for your comment. I had the same problem, my api called worked but I was annoyed to be redirected everytime.

@zgrose
Copy link

zgrose commented Apr 24, 2023

FWIW, if you type in https://yahoo.com?foo=1 in the Edge browser (112.0.1722.48), you'll see it flip to https://yahoo.com/?foo=1 as well. Dunno if there is some RFC at play, but it seems the days of https://yahoo.com?foo=1 are behind us.

@engolm
Copy link

engolm commented Apr 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants