New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional data to OpenSearch index. #766
Conversation
Adds the data_abstract (comment in yaml) to the opensearch indexer function.
Fixes test_submissions_csv in dashboard_test.py to use regex when checking against URLs in text as the URL on the webpage does not always match what is contained in the Flask config.
Adds testing for data abstract searching
This reverts commit 5c67dc8.
Adds table data to the OpenSearch index for submission searching in the OpenSearch document enhancers.
Updates test_search in search_test.py to include testing of table description text
This reverts commit 4cf908a.
Updates index to add file_description from additional resource objects within HEPSubmission/DataSubmission objects to the OpenSearch index.
Updates search_test OpenSearch testing to properly include some testing for searching of additional resources.
Updates the hard-coded/assumed variables used as expected data within the tests to accommodate the existence of a new test submission in the identifiers dictionary within conftest.py
Updates a broken test in test_dashboard.py, this test has issues running on my local system.
Now triggers rendering of LaTeX in record table descriptions as this was bugged and only occurred when the full table is loaded. Mentioned in issue #756
Fixes a bug where the table_options_region element ("Showing X of X values" and "Show All") on the records page would sometimes have the data of another table if loaded later.
Refactors the route to get data tables at /data/ to separate the data table (yaml file) and the headers query code. Implements a second route to select only the table data.
Updates the table renderer JS to properly use new route to select only data table.
Updates tests in response to previous commit.
Fixes an issue as a part of the large file loading bugs in #756 where the filesize threshold loading button would execute multiple queries. This occurred by multiple events being passed to the button. Now clears the buton events before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is great! I tested and it works well, but I had a few suggestions for improvements going a bit beyond the scope of the original issue:
- In addition to the free-text search, specific fields can now be searched with queries like
resources:hepdata_lib
ordata_abstract:"CERN-LHC"
. These examples could be given in the "Advanced Search Tips" (search_help.html
). - The original issue search: index
additional_resources
anddata_abstract
#557 mentions that the goal is "these additional resources become searchable and findable in automatic queries". In practice, this means that enough information should be given in the JSON format of the search results and record pages to enable (e.g.) a Python script to download the resource files. Currently, theresources
in the JSON format contains only a list ofdescription
fields of the resources. It should be expanded to give also aurl
andtype
so thatresources
is now a list of dictionaries (description
,url
,type
). Thetype
can be set tofile_type
of theDataResource
object, whileurl
is thefile_location
if it starts withhttp
, otherwise (for a resource file not an external link) construct theurl
from theid
similar to this line. - The
resources
of an individual data table should be added to thedata_tables
when viewing the JSON for a whole record and also to the JSON for an individual data table. This is not currently done. - You didn't modify the mapping in
record_mapping.py
, so the new fieldsdata_abstract
andresources
take default values (check http://localhost:9200/hepdata-main/_mapping ). This seems to work with the current code, but it might be necessary to give an explicit mapping after addressing point 2.
Fixes and issue where the get_table_details route was not properly routing when load_all flag is not set. This resulted in failure to find the correct route. Also removes method, as it likely does not matter.
Adds a function to generate/return the correct web link for a DataResource object. Adds associated test.
Update record_mapping to include resources/data_abstract at the record-level.
Adds function to generate a DataResource dictionary data from a DataResource object. Also adds testing.
Adds the full resource data to the OpenSeach index for both records and table resource objects.
Adds full resource data to the output of the process_data_tables function so the full resource data shows on an individual record's JSON output
Add search testing for new search functionality
Updates the "Advanced" search help tooltip in search_help.html to include new data_abstract searching, and searching by resource (type, description, url)
Changes use of SITE_URL in the tests to retrieve from app.config.get as this is safer.
Adds a shorthand for "resource.description" searching as "resources" in the OpenSearch query builder
Updates search help text in search_help.html to standardise the text for data abstract searching. Also updated to mention "resources" shorthand for description searching.
Add "app" fixture to broken tests that now seem to require it. (where app is used to get site url)
Adds resource informaton to the table_contents in get_table_details, which will be seen in the json output of an individual data table.
Replaces more use of SITE_URL importing directly from the config, which can be unsafe, to use the current_app value provided by flask. (adding missing imports)
Updates the search help widget in search_help.html to remove reference to non-shorthand "resources.description". Now just references "resources" as the shorthand term for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thanks a lot. I'll merge now.
Closes #557. Adds the
additional_resources
of a submission object, and thedata_abstract
(comment
fromsubmission.yaml
) to the OpenSearch index as mentioned in #557. Also updates a number of tests which broke upon adding new test data.