Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Wayback Machine snapshot selection #51

Open
dessant opened this issue May 12, 2022 · 3 comments
Open

Improve Wayback Machine snapshot selection #51

dessant opened this issue May 12, 2022 · 3 comments

Comments

@dessant
Copy link
Owner

dessant commented May 12, 2022

No description provided.

@dessant
Copy link
Owner Author

dessant commented May 12, 2022

I'll quote a review here because it contains useful information.

https://addons.mozilla.org/en-US/firefox/addon/view-page-archive/reviews/1836542/

It works to some extent, I like it's ability to open all the different web archives with one click.

There is a bit of an issue with some archives. For URLs: When I click on an URL to a Microsoft.com out-dated page, the Wayback Machine will take me to Microsoft's Error404 landing page.

This URL for example:
https://www.microsoft.com/en-us/download/details.aspx?id=45885

When passed to this add-on, the Wayback Machine converts it to this page:
https://web.archive.org/web/20220328035922/https://www.microsoft.com/en-us/download/404Error.aspx

*It's landing on a Wayback Redirect page. After 5 seconds, the page gets redirected to another page.


We can see in the date is /2022-03-28-03:59:22/ and this is one of the newest snapshots created by the Archive. It's unfortunate, but The Wayback Machine continues to create snapshots of these 404 pages.

So someone might say, why don't you just use the Wayback Date-toolbar to turn back to an older date? The problem is, since your tool is finding the newest snapshots, it's returning these 404 pages. This changes the URL that we're searching for.

The API docs for the Wayback Machine says "timestamp is the timestamp to look up in Wayback. If not specified, the most recenty available capture in Wayback is returned."

The correct way to use the API is to create a link like this:
http://archive.org/wayback/available?url=https://www.microsoft.com/en-us/download/details.aspx?id=45885&timestamp=20010101
*This will return a .json that contains a working "closest snapshot" URL and you can click on it.


It appears that this add-on is not using the API but is trying to manipulate URLs instead. This wont work well.

If you add the "&timestamp=20010101" key, it will enable the "Return closest snapshot to the date 2001-01-01" rather than return the newest available snapshot. The downside is, you'll need to write something that will handle the .json API return data. (which shouldn't be very hard)

Doing it that way will ALWAYS return a website. Not those Error404 landing pages.

@nyanpasu64
Copy link

Can you redirect to * (a date picker) rather than a particular date?

@dessant
Copy link
Owner Author

dessant commented Jan 2, 2023

@nyanpasu64, that is already possible using the Wayback Machine (all) engine, visit the extension's options to enable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants