Skip to content
This repository has been archived by the owner on Oct 3, 2021. It is now read-only.

What about mobile domains? #18

Open
ThirtySomething opened this issue Dec 21, 2019 · 12 comments
Open

What about mobile domains? #18

ThirtySomething opened this issue Dec 21, 2019 · 12 comments
Assignees

Comments

@ThirtySomething
Copy link

First of all: Thank you for your list. Makes my network much cleaner than without. Very good work!

I have found that the domain "ladies.de" is always locked, but "m.ladies.de" is always unlocked. Is there a way to block all subdomains of a domain?

I am not sure if this is a mistake in Pi-Hole interpreting your list or an error in the list itself. If you look for possible subdomains, the runtime of your script will be extended exponentially. I also don't know if this is an easy task to solve.

@chadmayfield
Copy link
Owner

Thanks @ThirtySomething I'll take a look at it in the next week or two and let you know what I find.

@MdBruin
Copy link

MdBruin commented Dec 24, 2019

Not only mobile domains give issues, some domains have a www/www8/w1 in front of it and it isn't blocked by Pi-hole. To solve it, all domains need to be added into the block list or the list needs be converted to a blacklist with wildcards. For example (^|.)ladies.de$ which will block them all (www.ladies.de/m.ladies.de/ladies.de). I started the manually check the list but forgot the mobile sites. I don't know what is faster, a block list or a blacklist. There are advantages to use the blacklist because you can shorten the list for double items with different domains. It is a big job to change the list. Not for adding the (^|.)l and \ but the change the double items which can be captured by a regex.

@chadmayfield
Copy link
Owner

@MdBruin, exactly my thoughts with the regex. At one point I was going down the path of writing a dns enumerator to query all possible dns entries of a domain... but it grew very quickly and took forever to run on 2 million domains so I killed the idea.

@MdBruin
Copy link

MdBruin commented Dec 27, 2019

@chadmayfield, I can understand it. The first step I have taken to filter all non existing pages and domains. So checking for 404 or no http response code, it takes ages to check them. Unfortunatelly I cannot feed the who list with all lines in it, it give a buffer overflow on my virtual machine and doesn't request a http code of the sites. So I'm running 10k at a time and around 10% doesn't exist anymore (after checking 15% of the total list).
After that I need to decide how to handle the next step, going to regex or make a script to check which pages are existing. I don't know if the second option is a good idea because it's already a big list and it will be a lot longer after adding all the options.

@ThirtySomething
Copy link
Author

ThirtySomething commented Dec 31, 2019

I don't know how the script works. Thinking about this a possible solution within a few steps:

$ curl -I --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" ladies.de
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://ladies.de/
Connection: close

Check the default of the domain. The user-agent is a desktop version. The domain is redirected to "https://ladies.de/". So let's check this one:

$ curl -I --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" https://ladies.de
HTTP/1.1 301 Moved Permanently
Date: Tue, 31 Dec 2019 16:18:55 GMT
Server: Apache
X-content-age: 18
Location: https://www.ladies.de/
Cache-Control: max-age=0
Expires: Tue, 31 Dec 2019 16:18:55 GMT
Vary: User-Agent
Content-Type: text/html; charset=UTF-8

This is again redirected to "https://www.ladies.de/" - let's check this:

$ curl -I --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" https://www.ladies.de
HTTP/1.1 200 OK
Date: Tue, 31 Dec 2019 16:20:12 GMT
Server: Apache
X-content-age: 18
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: PHPSESSID=tieohs55ldcniubg95i5tmdg11; path=/
Set-Cookie: anzeigenmarktPage=1; expires=Wed, 01-Jan-2020 16:20:12 GMT; path=/
Set-Cookie: aktuellesPage=1; expires=Wed, 01-Jan-2020 16:20:12 GMT; path=/
Vary: User-Agent
Content-Type: text/html; charset=UTF-8

This seems to be the valid domain. Now let's request this domain with a mobile user-agent:

$ curl -I --user-agent "Mozilla/5.0 (Linux; U; Android 4.4.2; en-us; SCH-I535 Build/KOT49H)" https://www.ladies.de
HTTP/1.1 307 Temporary Redirect
Date: Tue, 31 Dec 2019 16:21:47 GMT
Server: Apache
X-content-age: 18
Cache-Control: no-cache, max-age=1, must-revalidate, no-store
Pragma: no-cache
Location: https://m.ladies.de/
Cache-Control: max-age=0
Expires: Tue, 31 Dec 2019 16:21:47 GMT
Vary: User-Agent
Content-Type: text/html; charset=UTF-8

This is now rediected to "https://m.ladies.de/" - let's check this one, too:

$ curl -I --user-agent "Mozilla/5.0 (Linux; U; Android 4.4.2; en-us; SCH-I535 Build/KOT49H)" https://m.ladies.de
HTTP/1.1 200 OK
Date: Tue, 31 Dec 2019 16:22:50 GMT
Server: Apache
X-content-age: 18
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: mobileversion=1; expires=Tue, 14-Jan-2020 16:22:50 GMT; path=/; domain=.ladies.de
Set-Cookie: PHPSESSID=2qbu4h9d1lnhifqk60kuodk8k2; path=/
Vary: User-Agent
Content-Type: text/html; charset=UTF-8

Seems to be the default mobile domain.

This way you could find out the full url of the website and the full url of the mobile domain. But... this only works if the server handles the redirections. Otherwise this will not work. In the first line of the response you get the HTTP status code: 200 is ok - then the url is a valid url of the server. If it is a redirection (301), then the line "Location" refers to the new location. Looks like some parsing is necessary here. Keep in mind that other http status codes are possible, see the list here.

@ThirtySomething
Copy link
Author

By the way, the proposed solution works only in case Pi-Hole is not active. Otherwise the domains are blocked - of course.

@MdBruin
Copy link

MdBruin commented Jan 3, 2020

@ThirtySomething You can check the code into the test/check_domains.sh script.
First of all I'm not a programmer, but my hobby is computers. Did some php programming but that's it, so it took me some time and trying to understand his code (found the script after I made some checking code for testing) It checks against a response of the pi-hole from chadmayfield using curl and grep -c for the value X-Pi-hole. If it exists it isn't necessary to check again and it can be added to the list. If no hit is found it needs to be checked.

I new about the -A / --user-agent option and is used in my script for checking mobile domains. It's still a work in progress and not functional, just found out that bash has an option to create functions. It makes the code more readable and less code which is repeated.

@chadmayfield
Copy link
Owner

chadmayfield commented Jan 3, 2020

@MdBruin the script is a very rough PoC so if you extend it great! I abandoned it early on. I had gone down that path at one point and thought it would be good but it's kind of a rabbit hole. With CDN's and so many different domains that sites and content are served on I think it would almost be better to use Sub-domain/DNS enumeration on sites found in the main list to compile a list of current A/AAAA/CNAME in the domain DNS records (using something like SubBrute. See something similar by entering a domain here; https://dnsdumpster.com.

Rapid7 has a dataset of all forward DNS responses they receive through their Project Sonar. At some point it would be nice to use that dataset to extend my list.... but again I have other personal priorities at the moment and hope to get to it soon.

Ultimately a regex for each domain would be the best option.

@etienne-85
Copy link

Sorry if this is a bit out of subject but I was curious if using openDNS would'nt be simpler instead?
They already have customizable categories to block content on specific subjects.
I tested with the previous mobile url and it is correctly blocked.
Btw it is possible to combine both pihole and opendns to improve blocking. By just setting the main DNS in pihole to point to OpenDNS

@MdBruin
Copy link

MdBruin commented Jan 6, 2020

@etienne1911, it depends on your requirements. I run my pi-hole with a unbound and nsd dns server.
So for my situation it's not an option to use that DNS server, little bit because I can and privacy.

@chadmayfield I have tried the regex option. Using a filter took 100k domains with a single dot in it (like ladies.de) and made a script to add the regex options (for example (^|.)ladies.de$ including the escape slash for the . which are filtered out by github ). It slowed the pi-hole some, but mine is running in a virtual machine on a gen 8 Intel NUC i3. After that I have tried it on a RaspberryPi first gen and the impact was even bigger, too big to be useful and it's not even the complete list. It looks to me that the pi-hole first tries to make the first part (www/m/etc.) and add it to the complete web address and then checks it. The complete list is around 19x bigger and I think it's not the way to go. We can make the list smaller by filtering the .com .net. etc. tld's. Even that won't shrink the list that much. After that we could go to make complex regexes but that would mean there is a bigger chance of false positives and a need to include a whitelist which will be large.
Somehow I expected this result already, I have some knowledge of cracking passwords and even there is a wordlist much faster than a smaller one with mods on it.

@ThirtySomething
Copy link
Author

Hello again,

Sorry for the delay. @MdBruin I am not so familiar with bash scripting. Instead I created a small Python script that you can find here.

From my point of view there is no need to use such sophisticated things like checking DNS records or the like.

The webmasters want their pages to be called. Therefore I assume that the web servers are configured correctly. This means that if a wrong URL is called, the server sends a redirection to the correct URL. This can be exploited by following the redirections until you don't get another redirection.

This kind of determination can be made for the domain for a desktop browser as well as for a browser from a mobile device. From my point of view no more URLs are necessary to block them with Pi-Hole.

Maybe I see this topic too simple. But on the one hand this is quite pragmatic. And on the other hand it is also quite easy to implement. Everything else would be an extremely reliable solution. But from my point of view it would also be quite complex.

What do you think about the matter?

Greetings

ThirtySomething

@ThirtySomething
Copy link
Author

Hi @chadmayfield ,

I've found an interesting post to this topic here.

Regards

ThirtySomething

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants