Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less results than real Google #225

Open
WAZAAAAA0 opened this issue May 16, 2023 · 14 comments
Open

Less results than real Google #225

WAZAAAAA0 opened this issue May 16, 2023 · 14 comments

Comments

@WAZAAAAA0
Copy link

(cross-posted in searxng/searxng#2438, benbusby/whoogle-search#1004)

It's been years. Google-searching through a "privacy search engine frontend" will rarely find as many results as the real Google.

Here's a simple test to verify that: come up with a unique Google query that will find as few results as possible, preferably not in English. For example in my test I used "sfendazi" but you might need your own unique query since the results come and go. Perform the same search on every public instance, and observe how many find the same results (if the results contain garbage unrelated stuff, consider it a failure). This was the outcome yesterday as of 2023-05-15:

LibreX instances: 20 tested, 0 work

lmao

Whoogle instances: 17 tested, 3 work

https://s.tokhmi.xyz
https://whoogle.dcs0.hu
https://whoogle.privacydev.net

SearX/SearXNG instances: 92 tested, 19 work if you tweak a setting, only 1 works with defaults

(the only one that works with defaults is https://opnxng.com)
https://priv.au
https://xo.wtf
https://offtheradar.info
https://searx.oakleycord.dev
https://searx.cthd.icu
https://ooglester.com
https://search.bus-hit.me
https://myprivatesrx.us
https://coppedge.info
https://search.neet.works
https://search.zzls.xyz
https://search.us.projectsegfau.lt
https://s.frlt.one
https://searx.sev.monster
https://stalk.antelope.day
https://searx.esmailelbob.xyz
https://search.serginho.dev
https://search.cronobox.one
https://searx.mxchange.org

Those 19 instances I listed think they're "smart" and have set their Search language to [auto], which auto-selects it based on your browser headers... or they're simply set to something arbitrary, like [en-US]. Choosing [all] fixes the problem for them.
Meanwhile, the rest of the instances somehow will not find the correct results even when set to [all]. From what I've tested with a local SearXNG instance, adding search query parameter nfpr=1 (along with the pre-existing safe=off and filter=0) to searxng/searx/engines/google.py fixed it. Here's what they do:

  • nfpr=1 -> Showing results for XXX Search instead for YYY ON
  • safe=off -> SafeSearch OFF
  • filter=0 -> Include omitted results ON

Changing the Interface language is fine. Actually, I'd argue language auto-detection should happen to the interface, not to the search results filter, which would be consistent with how major search engines work.

Honestly, just take the Search language option away, it does more harm than good. Or at least make [all] the default and lock the option behind huge warning signs with flaming skulls that searching will be seriously degraded for everyone if anything other than [all] is selected. People don't understand this is the equivalent setting they're touching (taken from Google's official advanced search page):

TL;DR

Here's a picture to sum up the problem most search frontends are facing:

Proposed fixes:

  1. remove Search language and default it to [all]
  2. give [auto] to the Interface language instead
  3. add these 3 parameters to unlock all the possible Google results ?safe=off&nfpr=1&filter=0
@WAZAAAAA0
Copy link
Author

Your first suggestion would make it unusable for a lot of queries in most languages

I'll quote what has been said already on a SearX issue about it:

After testing a bit with whoogle, you are right the "default (none)" setting is much better than having to select a language. More over this doesn't require to select a language because Google will automatically recognize the language based on the entered words and display the results in the correct language. So much more convenient than having to force a language in the Searx settings every time you make searches in multiple languages.

@WAZAAAAA0
Copy link
Author

WAZAAAAA0 commented May 23, 2023

I'm aware words are shared in many languages and that if you search, for example, the word rape from an Italian IP/instance you'll probably get the vegetable, in French the cheese grater, and in English the assault.
My personal solution to that is to simply do better searches with more context like "rape vegetale".

Normally I'd be all for giving more choice to the user (given that the default should ALWAYS be [all]) but history has shown us it's not happening. Some instances block access to settings.php entirely. Some reset back to en when you try to set to language to empty. Some hide the "Google settings" section. And the rest, even when set to empty, still don't match the number of real Google results.

I'm tired boss. I just want to proxy my Google searches without being crippled.

@WAZAAAAA0
Copy link
Author

WAZAAAAA0 commented May 24, 2023

I see where you're coming from but that's not the "type of garbage" I meant. The garbage most likely got triggered by this Google message

Showing results for AAA
Search instead for BBB

which occurs more often when you narrow your search to only 1 language afaik. You're getting the results from AAA (aka GARBAGE, wildly unrelated) rather than the query BBB you were searching for

your absolute best fix would be to find a public instance hosted in your country, set to find results in all languages, but good luck with it

@WAZAAAAA0
Copy link
Author

if an user wants to willingly exclude the vast majority of results AND trusts Google's language filter accuracy as well that's fine by me, then we go back to the initial, less radical suggestion of

at least make [all] the default and lock the option behind huge warning signs with flaming skulls that searching will be seriously degraded for everyone if anything other than [all] is selected

@kubo6472
Copy link

I'm sorry for necroposting a bit. But I'm tired of always switching the language in the search setting between the two that give me correct results: first one being English, second one my native Slovak. Is this fixable the same way in LibreX? (e.g. put "all" in the results language?) Or can I do arrays, like [en,sk]?

@entrider
Copy link

In addition to the word input, google also seems to use location, IP probably, something librex or whoogle are unable to do, hence the need for that language selector.

But Google is able to. As a result you shouldnt expect to get the exact same results between instances hosted in different countries (and I believe between all instances in general).

Is this fixable the same way in LibreX? (e.g. put "all" in the results language?) Or can I do arrays, like [en,sk]?

It's technically possible to make 2 search queries and combine the results of each.

@kubo6472
Copy link

It's technically possible to make 2 search queries and combine the results of each.

Using "all" now. Getting a lot of russian results at the top. Putting a language code in the search itself now, helps much more.

@entrider
Copy link

LibreY has it, but only in the settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@WAZAAAAA0 @kubo6472 @entrider and others