Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement QB-like interface #65

Open
grantfitzsimmons opened this issue Aug 15, 2022 · 6 comments
Open

Implement QB-like interface #65

grantfitzsimmons opened this issue Aug 15, 2022 · 6 comments

Comments

@grantfitzsimmons
Copy link
Contributor

image

KU Herpetology would like to allow users to search Preparations in a way that can be specified more accurately. Due to the way the web portal works, this is not possible currently.

It would be nice to tell the advanced search that only the following values will be found in this data {ETOH, tissue, etc.}.

Current workaround is to create a splash image that details how to search the data, but that is not very clean.

@maxpatiiuk
Copy link
Member

maxpatiiuk commented Aug 15, 2022

For future reference:
After talking about this more, @grantfitzsimmons suggested to replace Web Portal with Specify 7.

Benefits:

  • Less code to maintain
  • Fewer services to setup
  • All new features added in Specify 7 are automatically in Web Portal
  • No need to export the data, as the data is already in Specify 7

We identified the following missing features that must be implemented before this is:

Concerns:

  • Some web portals are currently pulling the data from different databases. Need to find out how common it is
  • Usability is an issue. Query Builder is not appropriate for non database crowd, the full text indexing or simple list of fields to fill out is, like the way we have it now. I think building a web portal with a QB interface would be geeky, powerful, but wrong.

@beach53
Copy link
Member

beach53 commented Aug 23, 2022

Related Issues at the Institutional and Community Levels:

  • National and International Aggregators are already serving web portal functions. GBIF is no hosting customized web portals with GBIF software for collections and projects. [https://www.gbif.org/news/5D3ijLXMbpiZDBj0y0z1J/gbif-launches-hosted-portal-service]
  • Collaborative projects (Symbiota databases) prefer thematic web portals to highlight research interests and specialties.
  • Only a minority of Specify collections use our portal, mostly smaller places who have no better options, or who want us to host them for lack of inhouse IT support or due to campus security concerns.
  • What is the function of a collections web portal?
    -- Promote awareness the existence and vitality of a collection (marketing) "Look at our nice web site!"
    -- Provide a mechanism for researchers to see what is in the collection to promote use of specimens
    -- To identify collections strengths, for internal usage, seeing on a map where own collections are located.

@mcruz-umich
Copy link

We at U-M are about to launch a bunch of these sites. We just got "Fishes" up today:
http://ummz.fishes-specify-portal.apps.gnosis.lsa.umich.edu/

And plan to do one for Birds, Mammals, Mollusks, Insects and more!

The S7 replacement idea is interesting, however the Specify Web Portal does have some interesting things in it that we would need to see in S7.

  1. Does S7 have the "Map" feature that shows the locations of all the occurrences on a map?
  2. Our portals are currently public, so if we move them from the web portal code to the S7 code and we use SSO in S7, then will we have a problem with security? I suppose it also depends on if the S7-driven version of the web portals runs off of a cache or directly accesses the DB in read-only mode.
  3. One advantage to the web portal vs S7 is containers. I currently am deploying one Specify Web Portal per container. This means traffic / load is focused on just one container. If we move to S7 for these apps, then would we have the combined load of the Collection Managers doing workbench and heavy query operations along side the general public doing read-requests of these publicly published sets of data? I would think we want to keep the one-app-per-container focus.....for performance reasons, and security too.

Thanks for your consideration and working on these ideas.

Sincerely,
Matthew

@mcruz-umich
Copy link

I have developed the following process for removing "Tissues" from our publications.

It involves downloading the zip file from DataExporter, then unzipping it, modifying the csv, and then re-zipping it and deploying.

There is a bug in the "Schema Mapper" in which it checks for duplicate records BEFORE running respecting "distinct" checkbox. This bug therefore means you cannot query at the gift or preparation level and then reduce multiple rows down to a single row for the same Cat # even if you have the columns set to "do not show". The "Schema Mapper" logic needs to process the "distinct" checkbox FIRST before checking for duplicates. Since that is not working in 6.8.00.... I have developed the following process for removing gift records and stripping "Tissue - N" from the aggregated preparations field.

OPENOFFICE is used below to leverage a visual view of the csv data such that rows can be sorted and deleted easily.

  1. SP DATAEXPORTER: Export from DataExporter - this will include gifts and empty preparations

  2. OPENOFFICE: Open the csv in OpenOffice, sort by "preparations" and delete rows that have none

  3. SPECIFY: Run a query to get a list of the gifts

  4. SUBLIME TEXT: Open the csv in SublimeText and search for the gift numbers and delete those rows
    REGEX: ,(gift_num_1|gift_num_2|gift_num_3|...gift_num_n),

  5. SUBLIME TEXT: Search for "tissue" prepTypes and remove the string but keep the row. This will affect the Preparations-aggregation column and may result in a blank cell if it had only contained tissue
    REGEX: (tissue( - \d)?;? ?)|(; tissue( - \d)?)+

  6. Open in OpenOffice again and sort by Catalog Number

@maxpatiiuk
Copy link
Member

maxpatiiuk commented Sep 2, 2022

We at U-M are about to launch a bunch of these sites. We just got "Fishes" up today: http://ummz.fishes-specify-portal.apps.gnosis.lsa.umich.edu/

And plan to do one for Birds, Mammals, Mollusks, Insects and more!

The S7 replacement idea is interesting, however the Specify Web Portal does have some interesting things in it that we would need to see in S7.

  1. Does S7 have the "Map" feature that shows the locations of all the occurrences on a map?
  2. Our portals are currently public, so if we move them from the web portal code to the S7 code and we use SSO in S7, then will we have a problem with security? I suppose it also depends on if the S7-driven version of the web portals runs off of a cache or directly accesses the DB in read-only mode.
  3. One advantage to the web portal vs S7 is containers. I currently am deploying one Specify Web Portal per container. This means traffic / load is focused on just one container. If we move to S7 for these apps, then would we have the combined load of the Collection Managers doing workbench and heavy query operations along side the general public doing read-requests of these publicly published sets of data? I would think we want to keep the one-app-per-container focus.....for performance reasons, and security too.

Thanks for your consideration and working on these ideas.

Sincerely, Matthew

  1. We added ability to do spatial search (Add Spatial search capability to query specify7#1713) and the ability to plot query results on a map (Plot Query Builder's data using Leaflet specify7#1714). Those features would be included in one of the future releases.
  2. The new Specify 7 security & permissions system should help. You can set up anonymous user access, and set some permissions for that user. For more complicated use cases, you could probably resort to the current workflow of making a regular dump of data and importing that data into a separate, public Specify 7 instance.
  3. Getting a more powerful machine might help here. Also, read-only access to Specify 7 should not lead to very high CPU usage. The WorkBench is the most performance hungry tool, which won't be accessible to read-only users. Though, the second most power hungry might be the query builder. A similar solution of maintaining a separate Specify 7 instance for the public can be used.

While maintaining separate internal and public Specify 7 instance adds complication, it is not that different that the current strategy of having both web portal and Specify 7. The major benefit of replacing Web Portal with Specify 7 is that Specify 7 is already far more capable by most metrics and will only get more capable as it is the sole development focus.

@mcruz-umich
Copy link

maintaining a separate Specify 7 instance for the public can be used
Great idea!

While maintaining separate internal and public Specify 7 instance adds complication, it is not that different that the current strategy of having both web portal and Specify 7. The major benefit of replacing Web Portal with Specify 7 is that Specify 7 is already far more capable by most metrics and will only get more capable as it is the sole development focus.
I am now in agreement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants