Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status check #9

Open
bmschmidt opened this issue Jan 24, 2017 · 2 comments
Open

Status check #9

bmschmidt opened this issue Jan 24, 2017 · 2 comments

Comments

@bmschmidt
Copy link
Member

I want to do a status check--after the penultimate conference call, I think I was supposed to check in with @organisciak on the status of this.

I've built out the jsoncatalog.txt for the bookworm; it can be as the file "jsoncatalog_hathi.txt.gz" hosted on my personal web domain. (Not posting the URL because it's huge and I don't want robots to try and download it.)

There are still a few issues. The "contributing library" is sometimes a code, because it's looked up in a sub-optimal place.

There's not a full field_descriptions.txt, because it's not yet determined exactly which fields we want in the bookworm. (I plan to do some visualizations with the MARC field item, for instance; but I doubt anyone else cares about that.)

But I think it's necessary to get the code in shape to redo the metadata anyway (which mostly works at present) so it should be safe to build the bookworm with some subset of the data here.

Here are some random entries from the top of the first million or so items in the file; they are all from the non-PD portion of the collection, where we've looked less in the past.

{"cataloging_source":" ","scanner":"google","lc0":"J","lc1":"J","date":1956,"item_date":1956,"rights_changed_date":"2014-05-22","lc2":"500","literary_form":"Unknown","serial_killer_guess":"book","title":"Lok Sabha debates.--- 1956 pt.2 v.6:9-15","filename":"uc1.b3890417","contributing_library":"nrlf","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=uc1.b3890417><em>Lok Sabha debates.--- 1956 pt.2 v.6:9-15</em> (1956)","target_audience":"Unknown or not specified","cntry":"ii ","first_place":"New Delhi,","lc_class_from_lc":true,"first_publisher":"Lok Sabha Secretariat.","permalink":"https://babel.hathitrust.org/cgi/pt?id=uc1.b3890417","language":"eng","government_document":"f","subject_places":["a-ii---"],"record_date":null,"marc_record_created":"1984-02-11","resource_type":"serial"}
{"literary_form":"Unknown","contributing_library":"nrlf","permalink":"https://babel.hathitrust.org/cgi/pt?id=uc1.b5175500","serial_killer_guess":"book","cataloging_source":"d","scanner":"google","language":"rus","title":"Novoe i zabytoe /--- v.1","government_document":" ","target_audience":"Unknown or not specified","filename":"uc1.b5175500","cntry":"ru ","rights_changed_date":"2013-08-03","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=uc1.b5175500><em>Novoe i zabytoe /--- v.1</em> (1966)","first_place":"Moskva :","date":1966,"first_publisher":"Nauka,","marc_record_created":"1984-02-04","resource_type":"serial","record_date":1966}
{"lc2":"1174","scanner":"lit-dlps-dc","lc0":"H","lc1":"HG","date":1928,"item_date":1928,"rights_changed_date":"2013-08-19","cataloging_source":"d","literary_form":"Not fiction","serial_killer_guess":"book","title":"Valuta i valutna politika; nauchna anketa za prichiniti͡e na stopanskata kriza v Bŭlgarii͡a.","filename":"mdp.39015057135074","first_author_name":"Toshev, Gospodin P.","contributing_library":"University of Michigan","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015057135074><em>Valuta i valutna politika; nauchna anketa za prichiniti͡e na stopanskata kriza v Bŭlgarii͡a.</em> (1928)","target_audience":"Unknown or not specified","cntry":"bu ","first_place":"Sofii͡a,","lc_class_from_lc":true,"first_publisher":"Kooperativna pechatnit͡sa \"Franklin\",","permalink":"https://babel.hathitrust.org/cgi/pt?id=mdp.39015057135074","language":"bul","government_document":" ","record_date":1928,"marc_record_created":"1988-07-18","resource_type":"book"}
{"lc2":"2342.2.","scanner":"google","lc0":"L","lc1":"LB","date":1986,"item_date":1986,"rights_changed_date":"2013-11-23","cataloging_source":" ","literary_form":"Not fiction","serial_killer_guess":"book","title":"China : management and finance of higher education.","filename":"mdp.39015038055250","contributing_library":"University of Michigan","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015038055250><em>China : management and finance of higher education.</em> (1986)","target_audience":"Unknown or not specified","cntry":"dcu","first_place":"Washington, D.C., U.S.A. :","lc_class_from_lc":true,"first_publisher":"World Bank,","permalink":"https://babel.hathitrust.org/cgi/pt?id=mdp.39015038055250","language":"eng","government_document":"i","subject_places":["a-cc---"],"record_date":1986,"marc_record_created":"1988-07-18","resource_type":"book"}
{"lc2":"1","scanner":"google","lc0":"G","lc1":"GN","date":1996,"item_date":1996,"rights_changed_date":"2013-10-17","cataloging_source":" ","literary_form":"Unknown","serial_killer_guess":"serial","title":"Bulletin of the National Science Museum.--- v.22 1996","filename":"mdp.39015073103726","contributing_library":"University of Michigan","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015073103726><em>Bulletin of the National Science Museum.--- v.22 1996</em> (1996)","target_audience":"Unknown or not specified","cntry":"ja ","first_place":"Tokyo,","lc_class_from_lc":true,"first_publisher":"National Science Museum.","permalink":"https://babel.hathitrust.org/cgi/pt?id=mdp.39015073103726","language":"eng","government_document":"f","record_date":null,"marc_record_created":"1988-07-18","resource_type":"serial"}
{"cataloging_source":" ","scanner":"google","lc0":"D","lc1":"D","date":1982,"item_date":1982,"rights_changed_date":"2015-04-03","lc2":"1","literary_form":"Unknown","serial_killer_guess":"serial","title":"The Historian : a journal of history.--- v.44 1981/1982","filename":"mdp.39015068987661","contributing_library":"University of Michigan","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015068987661><em>The Historian : a journal of history.--- v.44 1981/1982</em> (1982)","target_audience":"Unknown or not specified","cntry":"riu","first_place":"[Kingston, R.I., etc.] :","lc_class_from_lc":true,"first_publisher":"Phi Alpha Theta,","permalink":"https://babel.hathitrust.org/cgi/pt?id=mdp.39015068987661","language":"eng","government_document":" ","record_date":1938,"marc_record_created":"1988-07-18","resource_type":"serial"}
{"cataloging_source":" ","scanner":"google","lc0":"H","lc1":"HC","date":1969,"item_date":1969,"rights_changed_date":"2013-08-08","lc2":"10","literary_form":"Unknown","serial_killer_guess":"book","title":"Mirovai͡a ėkonomika i mezhdunarodnye otnoshenii͡a.--- 1969:7-12","filename":"uc1.b3230826","contributing_library":"nrlf","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=uc1.b3230826><em>Mirovai͡a ėkonomika i mezhdunarodnye otnoshenii͡a.--- 1969:7-12</em> (1969)","target_audience":"Unknown or not specified","cntry":"ru ","first_place":"Moskva :","lc_class_from_lc":true,"first_publisher":"Pravda.","permalink":"https://babel.hathitrust.org/cgi/pt?id=uc1.b3230826","language":"rus","government_document":"o","record_date":null,"marc_record_created":"1988-07-18","resource_type":"serial"}
{"cataloging_source":"d","scanner":"google","first_publisher":"Deutsche Verlags-Anstalt","item_date":2005,"rights_changed_date":"2013-08-09","literary_form":"Unknown","serial_killer_guess":"book","title":"Osteuropa--- v.55:8 2005","filename":"uc1.32106020346950","contributing_library":"ucsc","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=uc1.32106020346950><em>Osteuropa--- v.55:8 2005</em> (2005)","target_audience":"Unknown or not specified","cntry":"gw ","first_place":"Stuttgart :","date":2005,"permalink":"https://babel.hathitrust.org/cgi/pt?id=uc1.32106020346950","language":"ger","government_document":" ","subject_places":["ee-----"],"record_date":null,"marc_record_created":"1975-09-01","resource_type":"serial"}
{"cataloging_source":"d","scanner":"google","first_publisher":"Badan Usaha Jaya Press Jajasan Jaya Raya],","item_date":1988,"rights_changed_date":"2015-09-03","literary_form":"Unknown","serial_killer_guess":"book","title":"Tempo.--- 1988 Index","filename":"mdp.39015066449201","contributing_library":"University of Michigan","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015066449201><em>Tempo.--- 1988 Index</em> (1988)","target_audience":"Unknown or not specified","cntry":"io ","first_place":"[Djakarta,","date":1988,"permalink":"https://babel.hathitrust.org/cgi/pt?id=mdp.39015066449201","language":"ind","government_document":" ","subject_places":["a-io---"],"record_date":1971,"marc_record_created":"1988-07-18","resource_type":"serial"}
{"cataloging_source":"d","scanner":"google","date":1974,"item_date":1974,"rights_changed_date":"2013-08-04","literary_form":"Not fiction","serial_killer_guess":"book","title":"Solar energy / c[Vlastimir A. Stevovich, Informatics, Inc.] ; csponsored by Advanced Research Project Agency.","filename":"mdp.39015002048653","first_author_name":"Stevovich, Vlastimir A.","contributing_library":"University of Michigan","searchstring":"<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015002048653><em>Solar energy / c[Vlastimir A. Stevovich, Informatics, Inc.] ; csponsored by Advanced Research Project Agency.</em> (1974)","target_audience":"Unknown or not specified","cntry":"vau","first_place":"Rockville, Md. :","first_publisher":"Informatics Inc.,","permalink":"https://babel.hathitrust.org/cgi/pt?id=mdp.39015002048653","language":"eng","government_document":" ","record_date":1974,"marc_record_created":"1988-07-18","resource_type":"book"}
@organisciak
Copy link
Member

Could you sum of all the return values for contributing_library? e.g. cat test.json | jq '.contributing_library' | sort | uniq. I know NRLF is "University of California Northern Regional Library Facility", Eleanor should have info for others.

Also, send me the URL for the imperfect current version that you have hosted, I'll try to build it with the unigrams.

@bmschmidt
Copy link
Member Author

I've fixed up the contributing libraries more recently by using the first few characters of the identifier instead of the contributing library code. That seems to do it.

I will look around for the data. I've noticed a few changes to the bookwormDB repo that need to be made. A few are already on my hosted version and I will push them this afternoon.

A few additional changes that need to happen I'm noting here, even though it's the wrong place, that will make the memory tables work better.

1. SET optimizer_search_depth=0;
2. Increase Memory Table limit by about 50%.
3. Ensure silencing of errors is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants