Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add something like parent_id to TSV specification #14

Open
frederik-elwert opened this issue Aug 16, 2019 · 6 comments
Open

Add something like parent_id to TSV specification #14

frederik-elwert opened this issue Aug 16, 2019 · 6 comments

Comments

@frederik-elwert
Copy link
Contributor

Currently, the TSV specification allows only for a free-text entry for parent. This makes it impossible to link to parent entries that are part of the same dataset. Would it make sense to add a column like parent_id which allows to specify an id corresponding to the parent entry?

@kgeographer kgeographer changed the title Add something like parent_id to TV specification Add something like parent_id to TSV specification Aug 16, 2019
@kgeographer
Copy link
Contributor

Yes it does make sense, thanks. At the moment I'm working on a few modifications to this TSV spec based on other feedback, and will add this to that list. Should be up for comment within a few days.

@kgeographer
Copy link
Contributor

kgeographer commented Aug 19, 2019

I've labeled the existing TSV spec as v0.1 and created a draft v0.2 and modified the examples. Labeled these "for comment" -- before coding the parsing in WHG, I'd want to hear comments, corrections, etc.

Many thanks for weighing in

@frederik-elwert
Copy link
Contributor Author

Just a request for clarification: Now the spec for parent_id states:

URI for a web-published record of the parent_name above

How would I describe that an entity in the same file is the parent, which would have an id, but not necessarily a web-published record (yet)? Would something like #parent123 work? (Resembling a local id reference in XML.)

@kgeographer
Copy link
Contributor

Ah, good point. When dataset files are uploaded to WHG, records are assigned a placeid in our system that will remain constant through any future updating. So they are effectively published and web-accessible. If parents are uploaded separately and first, then their URIs can be used in files that follow, but it is unreasonable to expect that workflow.

So would it work to allow (and parse) values like "#2345" for parent_id? On import, rows having a "#" in that position would be processed last, after placeids had been assigned to the previous.

@frederik-elwert
Copy link
Contributor Author

Yes, that sounds reasonable. In practice, I assume processing might become a bit more complex when more than two levels of hierarchy are included. But I guess this could be solved.

@kgeographer
Copy link
Contributor

Yes, solvable. Probably simplest as a database operation after all rows have been inserted. Settling this spec now means modifying a bunch of code and sample datasets, so I need to make upgrades to the spec as seldom as possible. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants