Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search does not work as it should #8

Open
trondtynnol opened this issue Feb 4, 2022 · 11 comments
Open

Search does not work as it should #8

trondtynnol opened this issue Feb 4, 2022 · 11 comments

Comments

@trondtynnol
Copy link
Contributor

When searching the documentation using the Google search bar, I often do not find anything, although what I am looking for exists in the documentation.

For example searching for environment variables site:https://giellalt.github.io/ returns nothing. This works searching the old site. Using the same example environment variables site:https://giellalt.uit.no/ yields the results I was looking for.

@snomos
Copy link
Member

snomos commented Feb 4, 2022

I am aware. The GH Pages theme we are using should be preconfigured for this, and I have checked that it is properly set up according to advice, still it does not improve search results.

I checked a.o. this: https://github.blog/2016-05-10-better-discoverability-for-github-pages-sites/ (and its links).

There are probably still things we can do: https://stackoverflow.com/questions/63720160/github-page-can-only-be-found-on-google-when-typing-username-and-github

All help welcome 😄

@trondtynnol
Copy link
Contributor Author

I will look into it a little :)

It seems DuckDuckGo does index the site correctly, so we might consider switching from Google to DDG if we cannot get Google to index it. The best solution would of course be for all search engines to index the site.

@snomos
Copy link
Member

snomos commented Apr 1, 2022

Here are some more tips on how to improve and optimize search engine performance: https://backlinko.com/hub/seo/sitemaps

It seems sitemaps are core to help indexing the pages, and we should probably automatize the process of updating it.

@snomos
Copy link
Member

snomos commented Apr 1, 2022

See also https://developers.google.com/search/docs/advanced/sitemaps/overview and follow the link at the bottom

@trondtynnol
Copy link
Contributor Author

trondtynnol commented Apr 1, 2022

Yes, I agree we should build a sitemap automatically, as it probably will improve search results somewhat.

However, it does seem that Google is using a very long time to actually index anything even though the sitemap is submitted. Almost two months have passed since I added the simple txt sitemap and still only 153 pages of the around 640 listed are indexed on Google.

@trondtynnol
Copy link
Contributor Author

I guess this plugin should do the trick: https://github.com/jekyll/jekyll-sitemap

@snomos
Copy link
Member

snomos commented Apr 1, 2022

That is one option. When I checked the Google Search Console, one thing that stood out was the lack of entries for sub-site documents: many files in keyboard-XXX/docs/* and lang-XXX/docs/* were not indexed because they never appeared in the sitemap (72 pages were not indexed, partly because of this). The easiest would be to create sitemaps for all of these separately as part of the build process.

How did you create the html sitemap file in the rood directory?

@snomos
Copy link
Member

snomos commented Apr 1, 2022

[Eg byter til norsk - foreign readers: use Google Translate for the remainder of the issue if you want to follow 🙂 )

Her er eit døme frå den mest frekvente feiltypen:

Skjermbilde 2022-04-01 kl  16 23 47

Slik eg forstår feilmeldinga så påstår Google at det ikkje finst andre sider som peiker til denne sida. Eg er litt overraska i dette tilfellet, men det kan nok stemma for mange sider - i det gamle Forrest-systemet så fanst det ein meny til venstre som vart halde ved like uavhengig av side-interne lenker, og det finst heilt sikkert ein del sider som det berre har vorte lenka til derifrå. Dei blir dermed utan lenke etter at vi flytta til GH/Markdown.

@snomos
Copy link
Member

snomos commented Apr 1, 2022

Bortsett frå at det ikkje stemmer:

grep -r HowToAddANewLanguage * 
AboutGiellaLT.md:[a ready-made setup](infra/infraremake/HowToAddANewLanguage.html) for adding more languages.
infra/infraremake/HowToMoveALanguageFromTheOldInfraToTheNew.md:* create [a new language directory](HowToAddANewLanguage.html)
infra/TechnicalMaintenance.md:* [How to add a new language to the infrastructure](infraremake/HowToAddANewLanguage.html)
sitemap.txt:https://giellalt.github.io/infra/infraremake/HowToAddANewLanguage.html

@trondtynnol
Copy link
Contributor Author

Jamt over verker det som at Google slit med å kravle gjennom sida, og eg skjønar ikkje heilt kvifor.

Eg laga den sitemap.txt-fila litt raskt for å teste om det kunne hjelpe, så om eg hugsar rett brukte eg ein variant av ls -R og så filtrerte eg ut nokre ting og la til url-en fyrst i linjene. Då kom nok sikkert ikkje genererte sider frå andre repositoriar med.

@snomos
Copy link
Member

snomos commented Apr 1, 2022

Då kom nok sikkert ikkje genererte sider frå andre repositoriar med.

Nei, sikkert ikkje, og det treng dei heller ikkje bli. Dei bør få eigne sitemap-filer, som blir autogenerert under bygginga. Då vil sitemap-fila alltid vera oppdatert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants