Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use actual IDs for next build (no = or + in htid) #2

Open
bmschmidt opened this issue May 31, 2016 · 2 comments
Open

Use actual IDs for next build (no = or + in htid) #2

bmschmidt opened this issue May 31, 2016 · 2 comments

Comments

@bmschmidt
Copy link
Member

bmschmidt commented May 31, 2016

For some reason the 'filename' elements in the Bookworm use a 'filename' that replaces the hathi trust id with colons and slashes. (Eg, psia.ark:/13960/t5z623168 becomes psia.ark+=13960=t5z623168.) I assume this has something to do with certain ids not working as file paths on some operating system. But can it be corrected before the bookworm receives the filenames? It creates a number of problems all through the pipeline whenever we interface with Hathi resources, and it seems to me it would be much better if bookworm just received canonical hathi id.

@organisciak
Copy link
Member

That's the clean id, which is part of the PairTree structure HathiTrust uses. If we have something labelled 'filename', the clean id is correct.

I'm in favour of using the htid as often as possible and keeping the clean id behind the scenes. In Bookworm, we could store both filename and htid, emphasizing the latter.

@bmschmidt
Copy link
Member Author

I think, to keep things simplest, the files hitting Bookworm should never even know of the clean id; it's easily derived from htid, and I haven't yet seen a use case for it. It's true filename is a required key in bookworm, but we shouldn't use cleanid for it: bookworm.filename (as opposed to filename in a hathi context) is just a synonym for 'unique document id.' And that's better served through htid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants