-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include taxon id with taxon label in facet count of entity search endpoint #386
Comments
It's not exactly what you're asking for, but would a facet structure like this work?: "facet_counts": {
"category": {
"disease": 27,
"publication": 9,
"anatomical entity": 5,
"cell": 5,
"gene": 2,
"sequence feature": 2,
"phenotype": 1,
"quality": 1
},
"taxon": {
"NCBITaxon:9031": 1,
"NCBITaxon:9606": 1
},
"taxon_label": {
"Gallus gallus": 1,
"Homo sapiens": 1
},
"_taxon_map": {
"NCBITaxon:9031": {
"Gallus gallus": 1
},
"NCBITaxon:9606": {
"Homo sapiens": 1
}
}
} Two things are different here: 1) there's a new If so, I have this implemented in my fork of the ontobio library -- here's where the |
That's fine with me. If this is easier to implement or more consistent with how other things and data structures in biolink are implmented, I'd say go for it. |
Is the main reason you chose that structure because it supports 1 to many id to label mappings Faisal? But is a list of objects going to cause even more issues in this case @vincerubinetti ? |
I formatted it that way partly because I wasn't sure if there might be more than one label that matches a given taxon ID, and also because that structure kind of more closely matches how facet pivots are returned from Solr. If IDs and labels are in fact one-to-one I agree that the structure you proposed is more readable, and it's a trivial change on my end. |
Let me do some research and see if I can confirm 1to1. |
I'm developing the 3.0 version of the monarch ui/website, and I've run into a limitation. @putmantime
Here is an example response from the
/search/entity/{term}
endpoint, searching "ssh":Notice that
taxon_label
is being returned for facets, instead oftaxon
(id). This is nice for displaying a list of taxon facets, but not for actually filtering by them, because the endpoint only supports filtering bytaxon
(id), nottaxon_label
.This requires the frontend to make a hard-coded label to id mapping for taxons. This duplicates information that we already have in biolink, is brittle, and is likely to get out of sync.
And yes, I can look up
taxon
fromdocs
by finding the correspondingtaxon_label
field. However, then I would need to make sure all results are indocs
so I have all the mappings, and that might go beyond the maxrows
[per page] param.Possible solutions:
Support a
taxon_label
filter parameter (in addition to thetaxon
parameter) in the search endpoint. I guess this would be most useful if it was an exact match, rather than a fuzzy match. If there are multiple taxon ids that map to the same exact taxon label, then this option wouldn't be viable.Return an additional
taxon
field infacet_counts
with all the information I need:id
,label
, andcount
. This would leave thetaxon_label
facet untouched so current applications using biolink don't suddenly break.Have some kind of
taxon_map
field at the top level of the response so I can go from label to id easily. Though, I think this is pretty ugly... don't want to add a top level thing for a special exception for just one type of facet.The text was updated successfully, but these errors were encountered: