-
Notifications
You must be signed in to change notification settings - Fork 24
Banned_words
We found that the crawler has a "bug" linked to banned words. At first, we thought that the crawler blocked all websites containing words from the list of banned words. Instead, we found that it only blocks an onion domain if the title of the website has banned words and does not look in the content of the page. Due to this functionality, one could save unwanted links. So far, the infrastructure offers two types of lists: a white one and a black one. The whitelist contains all websites that do not contain banned words in the title and the blacklist contains all websites with banned words in the title. We could change the rule so the content of the page is also scanned. However, this could trigger a lot of false positives. In the banned list, we have terms like "child porn" or "child pornography", but if a website has "no child pornography" in the content, the domain will still be banned.