Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Structure and file formats for bulk import/export #20

Open
kintopp opened this issue Jun 27, 2018 · 5 comments
Open

Discussion: Structure and file formats for bulk import/export #20

kintopp opened this issue Jun 27, 2018 · 5 comments

Comments

@kintopp
Copy link
Member

kintopp commented Jun 27, 2018

Placeholder for discussion on how to structure Excel spreadsheet used by data contributors providing bulk data to EM Places.

@kintopp kintopp added the import label Jun 27, 2018
@gklyne
Copy link
Contributor

gklyne commented Jun 27, 2018

If using spreadsheet import route, maybe consider Rightfield?

http://www.rightfield.org.uk

Last time I looked it was quite limited, but still maybe of some use to ease the import process.

If something more complex is required, there's a tool I wrote a whole ago that might help:

https://github.com/wf4ever/ro-manager/tree/develop/src/checklist

It would need some hacking, but is essentially a generic design capable of quite sophisticated conversions from CSV to RDF.

@kintopp
Copy link
Member Author

kintopp commented Jul 5, 2018

We have to think carefully about how to structure this – what kind of extended (i.e. non-core) data do we anticipate getting in bulk? Will the work required to conform data to our spreadsheet outweigh the benefit of uploading it in bulk? Should we start with a means to upload contributions to create core data + limited set of extended metadata only? In other words, just enough to allow a related gazetteer to import its records into EM Places and link back to it. As a point of comparison, see http://whgazetteer.org/2018/06/06/contributing-to-world-historical-gazetteer-a-preview/

@kintopp
Copy link
Member Author

kintopp commented Jul 6, 2018

Discussed on 5 July with Marnix the possibility of working backwards towards this from the Export format. i.e. Use the complete RDF sample to generate Export formats from Timbuctoo, then reuse these as Import formats where applicable (specifically, multiple-worksheet Excel spreadsheets, for example).

@gklyne
Copy link
Contributor

gklyne commented Jul 9, 2018

My current plan is to populate as much as possible from GeoNames as a separate (and presumably initial) step to creating a new place record. Then to generate additional data via a spreadsheet or other means.

What isn't yet clear in my mind is how we handle the data merging. I understand Timbuctoo has (or will have) an option to add or replace data, but we might end up needing something a little more subtle.

The approach of export -> edit -> import seems plausible to me. Of course, there will be details...

In choosing the import/export format, I think we should take care to avoid mixing elements of data merging logic with the import/export capability in Timbuctoo. E.g. If tabular, how do we handle the deeper graph structures used for, e.g., place relations and map resources?

@kintopp kintopp changed the title Define structure for Excel import spreadsheet Review structure for Excel import/export formats Jul 11, 2018
@kintopp kintopp added the export label Jul 11, 2018
@kintopp kintopp changed the title Review structure for Excel import/export formats Discussion: Structure and file formats for bulk import/export Aug 6, 2018
@kintopp
Copy link
Member Author

kintopp commented Aug 6, 2018

Current plan is to work with Graham's Annalist tool as a temporary solution until Timbuctoo's editor is in place at the end of September. Timbuctoo's current export spreadsheet format (from sample shared by Marnix) looks well suited for export, less so as a template for import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants