Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.

Treat URLs ending with .html and slashes or starting with www. as indistiguishable. #154

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

fiatjaf
Copy link
Contributor

@fiatjaf fiatjaf commented Jul 21, 2017

For all newly registered forms, remove starting 'www.', ending slashes and ending '.html' or '.htm'.

Then, for all forms, try to fetch not only the url that comes in the Referer (after referrer_to_path), but also its same version without 'www.', without '.html' or '.htm' and without the ending slash.

All existing forms will match as always. all new forms will be cleaned up and match against their cleaned version, it doesn't matter if their url change to add or remove 'www.', '.html' or '/' later.

@fiatjaf
Copy link
Contributor Author

fiatjaf commented Jul 21, 2017

I don't understand why these tests aren't passing. I can't debug httpretty decently. Everything is awful. I'll try again later.

@fiatjaf fiatjaf changed the title indistinguish urls with ending .html and slashes or starting www. Treat URLs ending with .html and slashes or starting with www. as indistiguishable. Jul 21, 2017
@fiatjaf
Copy link
Contributor Author

fiatjaf commented Sep 15, 2017

I need help. Looked again at this and still can't understand the problem. If someone could run the tests locally for this branch and identify what is wrong in the tests that aren't passing I would be delighted.

@rohitdatta
Copy link
Member

Can you rebase this off master, I'd like to take a look at the tests with the changes we made to the CI.

@rohitdatta
Copy link
Member

I'm personally not a fan of removing the .html and .htm however, I think if people are having an issue with /contact and /contact.html they should just upgrade to Gold and get proper sitewide forms. I do like the www subdomain however.

for all newly registered forms, remove starting 'www.', ending slashes
and ending '.html' or '.htm'.

then, for all forms, try to fetch not only the url that comes in the
Referer (after referrer_to_path), but also its same version without 'www.',
without '.html' or '.htm' and without the ending slash.

all existing forms will match as always. all new forms will be cleaned up
and match against their cleaned version, it doesn't matter if their url
change to add or remove 'www.', '.html' or '/' later.
when searching the form against the hash, use just the raw host and
the host_cleaup(host) versions, don't check for bizarre intermediate
cases.

when checking the form host for non-hash forms, use host_cleanup
once instead of doing multiple checks with rstrip, remove_www etc.
@fiatjaf
Copy link
Contributor Author

fiatjaf commented Feb 28, 2018

The mistery of failing tests.

@colevscode
Copy link
Member

@fiatjaf Can you remove the .html stuff, and just go with the www fix?
Also maybe some of those reset() calls are needed to pass the tests!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants