Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with indexing by Google Dataset Search #67

Open
andrewsu opened this issue Oct 8, 2020 · 5 comments
Open

Issues with indexing by Google Dataset Search #67

andrewsu opened this issue Oct 8, 2020 · 5 comments

Comments

@andrewsu
Copy link
Collaborator

andrewsu commented Oct 8, 2020

Just starting a thread to track notes on whether CViSB datasets are being indexed on Google Dataset Search.

Currently, there are five datasets on data.cvisb.org (all listed in https://data.cvisb.org/assets/sitemap.xml):

Two datasets are indexed (SARS-CoV-2, HLA) (https://datasetsearch.research.google.com/search?query=site%3Adata.cvisb.org)

image

Google Search Console reports 1 error, 0 "valid with warning" and 0 "valid" (https://search.google.com/search-console/datasets?resource_id=https%3A%2F%2Fdata.cvisb.org%2F). Oddly, the one error is for the HLA dataset (one of the successfully-indexed datasets). The error relates to having an object of type Organization under Citation.

image

Using the Rich Results Testing tool, that error shows up for 3 datasets (Ebola, Lassa, HLA) -- of those three, HLA is successfully indexed in Google Dataset Search. Two datasets (SARS-CoV-2 and systems serology) show up as "Page is eligible for rich results", but only systems serology is successfully indexed. The URL inspection tool on Google Search Console confirms that the datasets are successfully detected -- I just requested re-indexing in the hopes that those datasets will show up in Google Dataset Search (but I seem to recall doing this before).

image

And one last note that at different times, I have seen all five datasets successfully indexed and also three datasets successfully indexed. As far as I know, we have not changed anything on our end that would explain those changes. From now, will try to track that here...

@andrewsu
Copy link
Collaborator Author

andrewsu commented Oct 9, 2020

Possibly triggered by the reindexing I requested yesterday, all five datasets are currently being returned (https://datasetsearch.research.google.com/search?query=site%3Adata.cvisb.org). Keeping this issue open for the moment just to track any changes...

image

@andrewsu
Copy link
Collaborator Author

The same search as above (https://datasetsearch.research.google.com/search?query=site%3Adata.cvisb.org) is now returning 3 datasets

image

@andrewsu
Copy link
Collaborator Author

Note that https://datasetsearch.research.google.com/search?query=site%3Adata.cvisb.org is currently indexing all 5 datasets on data.cvisb.org. (Possibly due to the change in #68 implemented last week?) Will continue monitoring to assess stability.

Also noting that the search console (https://search.google.com/search-console/datasets?resource_id=https%3A%2F%2Fdata.cvisb.org%2F) is still not showing that those datasets are indexed (unchanged from screenshot above).

@andrewsu
Copy link
Collaborator Author

Unfortunately, https://datasetsearch.research.google.com/search?query=site%3Adata.cvisb.org is back down to 3 datasets (the same three as in #67 (comment)). Searching on the top-level domain https://datasetsearch.research.google.com/search?query=site%3Acvisb.org also returns the same three datasets. So it doesn't look like it's an issue with indexing our dev site instead of our prod site. Emailing Natasha to ask for feedback...

@andrewsu
Copy link
Collaborator Author

Today, https://datasetsearch.research.google.com/search?query=site%3Adata.cvisb.org is back up to 5 datasets (same as in #67 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant