-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locality and Geocoorddetail batch import tool #4548
base: production
Are you sure you want to change the base?
Conversation
|
||
return { | ||
missingRequiredHeaders: | ||
accumulator.missingRequiredHeaders.filter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use removeItem() util?
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Molecules/FilePicker.tsx
Outdated
Show resolved
Hide resolved
readonly error: LocalizedString; | ||
}): JSX.Element { | ||
return ( | ||
<p role="alert"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) would ErrorMessage
component be better suited here or would it be too "loud"?
@@ -56,25 +49,43 @@ export function WbImportView(): JSX.Element { | |||
|
|||
function FilePicked({ file }: { readonly file: File }): JSX.Element { | |||
const fileType = inferDataSetType(file); | |||
const getSetDataSetName = useTriggerState(extractFileName(file.name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a stripFileExtension
util, which does something similar to extractFileName
could you please decide if one of these functions should be removed in favor of the other?
@@ -143,6 +143,9 @@ export const headerText = createDictionary({ | |||
abgeschlossen ist. | |||
`, | |||
}, | |||
coGeImportDataset: { | |||
'en-us': 'Import CoGe Dataset' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, so CoGe is even exposed to the user?
would they know what coge means?
@@ -292,4 +292,17 @@ export const localityText = createDictionary({ | |||
'uk-ua': 'DD MM SS.SS N/S/E/W (32 45 42.84 N)', | |||
'de-ch': 'DD MM SS.SS N/S/O/W (32 45 42.84 N)', | |||
}, | |||
localityImportHeaderError: { | |||
'en-us': 'Errors Found in Column Headers', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) is this better or worse?
'en-us': 'Errors Found in Column Headers', | |
'en-us': 'Errors found in column headers', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alesan99 To test this, I used a single Locality record on KUFish with a GUID of Then I imported the following CSV using the tool (where the only values are in the
The value in the GeoCoordDetail for the Locality remained untouched. Are you able to provide the line of the CSV which corresponds to a Locality/GeoCoordDetails for which this occurs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing instructions
For testing purposes, here is an example set of data for use on the KUFish
database:
kufish_coge_sample.csv
Other datasets have been made for FITZ_NHMD
, fwri
, KUBirds
, and lsumzmammals
and are available in the following folder in the google drive:
https://drive.google.com/drive/folders/1B0hKQMBaX82nBUOB95uEa3H0MIrGrBv7?usp=drive_link
A foundation for other Datasets for use in other databases can be made by creating a Locality -> GeoCoordDetail query and exporting the results. For an example, here is the Query used to populate most of kufish_coge_sample.csv
:
https://kufish51623-edge.test.specifysystems.org/specify/query/191/
There should be no errors in the kufish_coge_sample
dataset presently, so modifications can be made to accomplish the desired errors and edge cases.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
Issues
The original error message that happens if there is an unknown field stays behind the loading screen the whole time and is still behind the create record set dialog at the end as well. This dialog should close once you start the upload.
cDT3fu9re.mp4
Could not consistently recreate this but sometimes after you imported a file and got some importing error (e.g field value is too long) if you tried to import the file again you would get a full specify error where you would have to return to the homepage.
Changes
I think when you are importing instead of just a loading screen for a long time it should have something like the workbench has
Like this:
I also think that it should auto create a record set. With the current behavior it is very easy to accidentally click out and then you cannot go back to create one again. At the very least I think you should not be able to click out of the 'create record set' dialog unless you explicitly click 'cancel'.
I also think the error dialog needs to include the header- just seeing the row isn't always super intuitve, also the column names in this dialog are too close together in my opinion.
Hm after doing some more testing in KU_Birds it looks like it only happens if I had a source defined in the This was the culprit: <style> </style>
Removing the value from source no longer overwrites the Geo Coord details. I don't quite know "source"'s relation to Geo Coord details, but unless this is unintentional it looks like nothing was going wrong after all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing instructions
For testing purposes, here is an example set of data for use on the KUFish
database:
kufish_coge_sample.csv
Other datasets have been made for FITZ_NHMD
, fwri
, KUBirds
, and lsumzmammals
and are available in the following folder in the google drive:
https://drive.google.com/drive/folders/1B0hKQMBaX82nBUOB95uEa3H0MIrGrBv7?usp=drive_link
A foundation for other Datasets for use in other databases can be made by creating a Locality -> GeoCoordDetail query and exporting the results. For an example, here is the Query used to populate most of kufish_coge_sample.csv
:
https://kufish51623-edge.test.specifysystems.org/specify/query/191/
There should be no errors in the kufish_coge_sample
dataset presently, so modifications can be made to accomplish the desired errors and edge cases.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
Functionality works great!
Changes
-
I think the name should be changed, it looks kind of weird in the user tools since it's so long. Maybe something like 'Import Locality Dataset' to shorten it.
-
I think there should be some sort of notification when it finishes. Since the import continues even if you close out of specify I think the user should get a notification when the record set is created.
-
The error dialog still looks a little weird and does not behave as expected. I would recommened at least changing 'row number' to 'row' and changing how the columns move when the dialog is resized.
O9Hliui8Ep.mp4
- There should be some way to change the record set name before you click the link to it. I know that once the record set is created you can select the pencil icon and change it there but I don't know if users will want to go out of their way to have to change it. Maybe something like the workbench create a record set where it has the name auto filled in but you can change it before saving.
I like this dialog but maybe a pencil here or something so users can change the name before linking out?
- If you upload a dataset with errors you can still click on the 'import dataset button' and it'll just import the same one again. I think once you've imported a dataset that button should be grayed out unless you upload a new dataset otherwise users are stuck in the dialog until it finishes if they accidentally click it
qf78Dzv5ia.mp4
Overall, functionality works great, just some ui things I'd like changed! Great Work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing instructions
For testing purposes, here is an example set of data for use on the KUFish
database:
kufish_coge_sample.csv
Other datasets have been made for FITZ_NHMD
, fwri
, KUBirds
, and lsumzmammals
and are available in the following folder in the google drive:
https://drive.google.com/drive/folders/1B0hKQMBaX82nBUOB95uEa3H0MIrGrBv7?usp=drive_link
A foundation for other Datasets for use in other databases can be made by creating a Locality -> GeoCoordDetail query and exporting the results. For an example, here is the Query used to populate most of kufish_coge_sample.csv
:
https://kufish51623-edge.test.specifysystems.org/specify/query/191/
There should be no errors in the kufish_coge_sample
dataset presently, so modifications can be made to accomplish the desired errors and edge cases.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
Import tool looks excellent!
For the most part, I agree with everything that @emenslin mentioned for changes. I also think 'Import Locality Dataset' is a more concise name instead of 'Import Locality Repatriation Dataset'.
I noticed one small discrepancy: the row number in the parsing error notification does not match the one in the actual dataset. I understand that this happens because the row count starts at 0
and doesn't account if the first row is selected as the header. Although it's a nitpicky suggestion, I believe matching the row numbers would make it more intuitive and easier to navigate errors, since most spreadsheet editors start with 1.
Fixes #4319
The Locality and GeoCoordDetail batch import tool allows users to match and update the Latitude1, Longitidude1, and Datum fields for Locality records and associate new GeoCoordDetails for the matched Locality records.
The primary motivation for the tool is such that the georefrencing for existing records in the database can take place outside of Specify (such as with CoGe), and there is no easy way to currently repatriate the data back into Specify and update those records.
This process is facilitated by requiring the guids of the Locality records in the CSV. Specify will use these guids to match/find the corresponding Locality record(s).
With the Locality record, the tool does the following:
latitude1
,longitude1
, and/ordatum
fields have values for the Locality record in the CSV, Specify will overwrite the current values in the database with those specified in the CSVDemo
Currently, access to the tool can be found in the User Tools menu:
The primary interface is very similar to that of importing a file to be used as a DataSet in the WorkBench. Specify supports different encodings and delimiters, and displays the first 100 rows of the file:
The
Import File
button can be pressed to initiate the parsing/upload.If there are any unknown columns in the dataset, a warning will first be displayed to the user so they can decide how to proceed:
If there are any bad values discovered when parsing the results, Specify will display a downloadable message stating which line the error was found on along with a message with more information about how to resolve the parsing error:
If the parsing and upload is successful, Specify will display a brief results page stating how many records of each table were affected, along with the option to create a RecordSet of modified Locality Records:
Checklist
and self-explanatory (or properly documented)
Testing instructions
For testing purposes, here is an example set of data for use on the
KUFish
database:kufish_coge_sample.csv
Other datasets have been made for
FITZ_NHMD
,fwri
,KUBirds
, andlsumzmammals
and are available in the following folder in the google drive:https://drive.google.com/drive/folders/1B0hKQMBaX82nBUOB95uEa3H0MIrGrBv7?usp=drive_link
A foundation for other Datasets for use in other databases can be made by creating a Locality -> GeoCoordDetail query and exporting the results. For an example, here is the Query used to populate most of
kufish_coge_sample.csv
:https://kufish51623-edge.test.specifysystems.org/specify/query/191/
There should be no errors in the
kufish_coge_sample
dataset presently, so modifications can be made to accomplish the desired errors and edge cases.Generally, the cases which would need to be tested are:
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the datasetlongitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten)