Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Search #191

Open
dfreilich opened this issue Oct 29, 2020 · 13 comments · Fixed by #220
Open

Improve Search #191

dfreilich opened this issue Oct 29, 2020 · 13 comments · Fixed by #220
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@dfreilich
Copy link

Expected Behavior

The search bar to display information on the site, and perhaps relevant tasks on Tekton Hub.

Actual Behavior

The search displayed ads, logs, and some somewhat relevant links, but not the documentation I was looking for.

Steps to Reproduce the Problem

  1. Go to https://tekton.dev/
  2. Search for "Pipeline"
  3. Look at the results

Additional Info

I was hoping that search would allow me to surface documentation for concepts I was looking to precisely define (e.g. Pipelines, PersistentVolumeClaim`...), but I haven't seen any links shown in the search to the documentation.

@afrittoli afrittoli added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Nov 11, 2020
@bobcatfish
Copy link
Collaborator

@afrittoli i have added you and the rest of the governing board to have access to https://programmablesearchengine.google.com/cse/setup/basic?cx=013756393218025596041:6eajntqsa6c (which i think is a link to the tekton.dev custom search?)

at a glance i cant tell what's wrong:

image

image

it doesnt seem like it should be searching the entire web 🤔

I'm not sure if we can remove the ads but we can try. Another approach might be to use something to index the site ourselves.

@afrittoli
Copy link
Member

I changed the URL from anything with tekton.dev to URL pattern tekton.dev/* which should hopefully remove all unwanted subdomains.

I also found a way to submit a sitemap, that hugo already produces at https://tekton.dev/sitemap.xml. URLs in the sitemap are relative and that is not correct, in fact when I try to submit a sitemap google search tells me that it finds 170+ errors:

image

@afrittoli
Copy link
Member

It looks like me have to change the baseUrl to a valid one?
https://discourse.gohugo.io/t/google-search-console-reports-sitemap-xml-as-invalid/10474/12

@afrittoli
Copy link
Member

About ads, I found the following answer:

There is no paid version for Google Custom Search Engine (CSE). Please review the document below to understand current CSE offerings:
https://support.google.com/customsearch//answer/9069107

By default CSE will display Ads in its results. In order to have the option to disable Ads, you can consider following the below options:

i) If your organisation is a non-profit, then you can have Ads disabled:
https://support.google.com/customsearch/answer/4542102

ii) You can use the API (Key enabled from Cloud Developer Console) with your CSE engine and retrieve results using JSON API without Ads:
https://developers.google.com/custom-search/v1/overview

iii) You can create an Adsense account and have it integrated with your CSE engine and can control to configure to not show competitors' ads on your website. Please review the document:
https://support.google.com/customsearch/answer/4542011

@afrittoli
Copy link
Member

Related issue on knative side: knative/website#23

@afrittoli
Copy link
Member

The sitemap is now generated correctly, if the search console and cse match the content should be coming from that.

@afrittoli
Copy link
Member

However no indexing still, because we set <meta name="ROBOTS" content="NOINDEX, NOFOLLOW"> in our homepage.

afrittoli added a commit to afrittoli/tektoncd-website that referenced this issue Nov 26, 2020
The website header partial sets the meta index, follow if the
HUGO_ENV is set to production, and no index no follow otherwise.

Change the HUGO_ENV to be production for the production environment.

Related to tektoncd#191

Signed-off-by: Andrea Frittoli <andrea.frittoli@gmail.com>
tekton-robot pushed a commit that referenced this issue Nov 26, 2020
The website header partial sets the meta index, follow if the
HUGO_ENV is set to production, and no index no follow otherwise.

Change the HUGO_ENV to be production for the production environment.

Related to #191

Signed-off-by: Andrea Frittoli <andrea.frittoli@gmail.com>
@afrittoli afrittoli self-assigned this Nov 26, 2020
@afrittoli
Copy link
Member

OK, to summarise:

  • In the CSE (custom search engine), I changed the site to be all pages that contain https://tekton.dev/
  • In the CSE, I added an exclusion URL pattern *.tekton.dev/* to exclude all subdomains (hub-preview, logs, prow etc)
  • In the tekton.dev website, I changed the sitemap to use fully qualified URLs
  • In the tekton.dev website, I changed the homepage meta to INDEX, FOLLOW (instead of NOINDEX, NOFOLLOW)
  • In the Google Search console, I added a sitemap for https://tekton.dev/. The base URL matches that configured in the CSE, so, according to the docs the content in the CSE should come from the index built through the sitemap

Current status:

  • because of the exclusion list, all previous unrelated search results are now gone
  • it is currently not possible for force a re-index via the google search console, so we need to wait for the crawler to do its job
  • the sitemap is parsed correctly
  • ads are still enabled. Since Tekton is owned by the CDF, which is a no-profit, we should be entitled to apply for ads-free search results - see related work on knative side

@afrittoli
Copy link
Member

afrittoli commented Dec 3, 2020

Crawling by Google started, it is not yet complete, but results are starting to show.
The filter on the CSE was too eager, it looks like *.tekton.dev matches tekton.dev too, which is surprising, since the latter does not start with a ".". In any case, I changed the filtering to match each of the subdomains that we do not want.

image

It may be possible to give search results from older versions less relevance, something to look into as well.

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@afrittoli
Copy link
Member

/remove-lifecycle stale

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@afrittoli
Copy link
Member

/lifecycle frozen

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 3, 2021
@afrittoli afrittoli added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Nov 3, 2021
@afrittoli
Copy link
Member

Search now works, so I decreased the criticality of this item, but there are still a few issues to be solved:

  • search results should produce links to the latest release by default
  • search includes advertisements

Sorting results by date instead of relevance bring up the most recent pages first, but it may not be an ideal solution.
'Relevance' is the default method of sorting.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

4 participants