Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web.archive.org links cause exceptions, or links is tranformed #91

Open
gitressa opened this issue Jun 9, 2019 · 0 comments
Open

web.archive.org links cause exceptions, or links is tranformed #91

gitressa opened this issue Jun 9, 2019 · 0 comments

Comments

@gitressa
Copy link
Contributor

gitressa commented Jun 9, 2019

See for example these two examples, which result in two different exceptions:

{"distance":1,"exception":"The base path \"\/web\/20050309042332\/http:\/\/www.fsk.dk\/fsk\/div\/interception\" is not an absolute path.","referrer":"https:\/\/example.org\/test","referrer_title":"http:\/\/web.archive.org\/web\/20050309042332\/http:\/\/www.fsk.dk\/fsk\/div\/interception\/aflytningcampbellsrapportpaadanskversion2.htm","referrer_xpath":"\/html\/body\/div[2]\/div\/section\/div\/section\/article\/div\/div\/div\/p\/a[2]","request_time":320473,"status":200,"url":"http:\/\/web.archive.org\/web\/20050309042332\/http:\/\/www.fsk.dk\/fsk\/div\/interception\/aflytningcampbellsrapportpaadanskversion2.htm","timestamp":"2019-06-09T20:01:07+02:00"}

{"distance":1,"exception":"The base path must be a non-empty string. Got: \"\"","referrer":"https:\/\/example.org\/test","referrer_title":"https:\/\/web.archive.org\/web\/20190525092213\/https:\/\/www.fsk.dk\/","referrer_xpath":"\/html\/body\/div[2]\/div\/section\/div\/section\/article\/div\/div\/div\/p\/a[3]","request_time":818032,"status":200,"url":"https:\/\/web.archive.org\/web\/20190525092213\/https:\/\/www.fsk.dk","timestamp":"2019-06-09T20:01:07+02:00"}

Formatted for easier reading:

{
  "link": "http://web.archive.org/web/20050309042332/http://www.fsk.dk/fsk/div/interception/aflytningcampbellsrapportpaadanskversion2.htm",
  "status": 200,
  "exception": "The base path \"/web/20050309042332/http://www.fsk.dk/fsk/div/interception\" is not an absolute path."
}
{
  "link": "https://web.archive.org/web/20190525092213/https://www.fsk.dk/",
  "status": 200,
  "exception": "The base path must be a non-empty string. Got: \"\""
}

Another oddity is that a link is checked as it is on one server, but on another server, the link is transformed. So this link is checked like this on one server:
http://web.archive.org/web/20050309042332/http://www.fsk.dk/fsk/div/interception/aflytningcampbellsrapportpaadanskversion2.htm

But on another server is somehow transformed by Fink, and gets checked like this:
http://web.archive.org/titlelist/eche//web/20050309042332/http://www.fsk.dk/fsk/div/interception/aflytningcampbellsrapportpaadanskversion2.htm
... where titlelist/eche/ is part of the URL.

This is the result from the server where the link is transformed: {"distance":3,"exception":null,"referrer":"https:\/\/example.org\/titlelist\/eche","referrer_title":"Listening:","referrer_xpath":"\/html\/body\/div\/div\/div\/section\/div[2]\/section[2]\/div\/div\/div\/div[1]\/span[2]\/div\/p[12]\/a","request_time":1562045,"status":404,"url":"http:\/\/web.archive.org\/titlelist\/\/web\/20050309042332\/http:\/\/www.fsk.dk\/fsk\/div\/interception\/aflytningcampbellsrapportpaadanskversion2.htm","timestamp":"2019-06-04T01:15:01+02:00"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant