Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to first pass? #5

Open
bmschmidt opened this issue May 27, 2016 · 0 comments
Open

Changes to first pass? #5

bmschmidt opened this issue May 27, 2016 · 0 comments

Comments

@bmschmidt
Copy link
Member

Just in case anyone wants to comment, here are some sample records as the script is currently pulling them out. The fields I'm most excited about for general purpose are contributing library and scanner, which I think are important to know; the distinction between 'item date' and 'record date' (the former from field 974, which should generally be higher quality), which should improve resolution.

"subject_places" (Field 045) is also both interesting and quite highly populated. I've run Erez and JB"s "Serial killer" algorithm to identify serial publications; looking at how that interacts with the MARC 'serial' field will ve some interesting comparisons between this and the ngram viewer.

A few things, like "marc_record_created" date and "lc_class_from_lc", are unlikely to be interesting to anyone but myself; we might not even load them into the primary bookworm.

I've now got the lookup to convert "lc2" into a finer taxonomy of several thousand items. The only problem is that it's highly hierarchical; I'm not sure how to usefully represent all the branching trees in the database.

Compared to the current bookworm, I don't have information about length and physical format. I think that's it, though?

{"cataloging_source": "e", "scanner": "google", "lc0": "F", "lc1": "F", "date": 1919, "first_author_birth": 1881, "item_date": 1919, "rights_changed_date": "2016-04-26", "lc2": "3444", "serial_killer_guess": "book", "title": "Organizaci\u00f3n de la iglesia y \u00f3rdenes religiosas en el virreinato del Per\u00fa en el siglo XVI : documentos del Archivo de Indias /--- V.1", "filename": "txu.059173017892026", "first_author_name": "Levillier, Roberto, 1881-", "contributing_library": "txu", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=txu.059173017892026><em>Organizaci\u00f3n de la iglesia y \u00f3rdenes religiosas en el virreinato del Per\u00fa en el siglo XVI : documentos del Archivo de Indias /--- V.1</em> (1919)", "cntry": "sp ", "first_place": "Madrid :", "lc_class_from_lc": false, "first_publisher": "Sucesores de Rivadeneyra,", "permalink": "https://babel.hathitrust.org/cgi/pt?id=txu.059173017892026", "language": "spa", "government_document": " ", "subject_places": ["s-pe---"], "record_date": 1919, "marc_record_created": "1989-08-28", "resource_type": "book"}
{"lc2": "466.", "scanner": "google", "lc0": "S", "lc1": "SB", "date": 1929, "item_date": 1929, "rights_changed_date": "2014-02-21", "cataloging_source": "d", "serial_killer_guess": "book", "title": "Japanese gardens,", "filename": "mdp.39015027407272", "first_author_name": "Taylor, Harriet (Osgood), Mrs.", "contributing_library": "University of Michigan", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015027407272><em>Japanese gardens,</em> (1929)", "cntry": "nyu", "first_place": "New York,", "lc_class_from_lc": true, "first_publisher": "Dodd, Mead and company,", "permalink": "https://babel.hathitrust.org/cgi/pt?id=mdp.39015027407272", "language": "eng", "government_document": " ", "subject_places": ["a-ja---"], "record_date": 1929, "marc_record_created": "1989-08-28", "resource_type": "book"}
{"contributing_library": "University of Michigan", "item_date": 2007, "permalink": "https://babel.hathitrust.org/cgi/pt?id=mdp.39015075632599", "serial_killer_guess": "book", "cataloging_source": "d", "scanner": "google", "language": "eng", "title": "School-age children in regulated family child care settings.", "government_document": "f", "rights_changed_date": "2013-12-29", "filename": "mdp.39015075632599", "cntry": "dcu", "subject_places": ["n-us---"], "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=mdp.39015075632599><em>School-age children in regulated family child care settings.</em> (2007)", "first_place": "[Washington, D.C.] :", "date": 2007, "first_publisher": "U.S. Dept. of Health and Human Services, Administration for Children and Families, Child Care Bureau,", "marc_record_created": "2008-05-16", "resource_type": "book", "record_date": 2007}
{"cataloging_source": "r", "scanner": "google", "lc0": "N", "lc1": "ND", "date": 1906, "first_author_birth": 1850, "item_date": 1906, "rights_changed_date": "2013-10-17", "lc2": "623.", "serial_killer_guess": "book", "title": "Raffael : des meisters gem\u00e4lde in 203 abbildungen /", "filename": "njp.32101067661536", "first_author_name": "Rosenberg, Adolf, 1850-1906.", "contributing_library": "Princeton University", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=njp.32101067661536><em>Raffael : des meisters gem\u00e4lde in 203 abbildungen /</em> (1906)", "first_author_death": 1906, "cntry": "gw ", "first_place": "Stuttgart und Leipzig :", "lc_class_from_lc": false, "first_publisher": "Deutsche verlags-anstalt,", "permalink": "https://babel.hathitrust.org/cgi/pt?id=njp.32101067661536", "language": "ger", "government_document": " ", "record_date": 1906, "marc_record_created": "1976-12-17", "resource_type": "book"}
{"contributing_library": "University of Wisconsin", "item_date": 1905, "permalink": "https://babel.hathitrust.org/cgi/pt?id=wu.89094626389", "serial_killer_guess": "book", "cataloging_source": "d", "scanner": "google", "language": "ger", "title": "Die Welt des Sichtbaren : eine Betrachtung u\u0308ber die Art und  Weise unseres Sehens /", "government_document": " ", "filename": "wu.89094626389", "cntry": "gw ", "rights_changed_date": "2013-08-01", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=wu.89094626389><em>Die Welt des Sichtbaren : eine Betrachtung u\u0308ber die Art und  Weise unseres Sehens /</em> (1905)", "first_place": "Leipzig :", "first_author_name": "Kiesel, Arthur.", "first_publisher": "R. Voigtla\u0308nder,", "marc_record_created": "1977-10-26", "date": 1905, "resource_type": "book", "record_date": 1905}
{"contributing_library": "nrlf", "item_date": 1914, "permalink": "https://babel.hathitrust.org/cgi/pt?id=uc1.$b23538", "serial_killer_guess": "book", "cataloging_source": "d", "scanner": "google", "language": "eng", "title": "The river Amazon from its sources to the sea /", "government_document": " ", "rights_changed_date": "2013-09-27", "filename": "uc1.$b23538", "cntry": "nyu", "subject_places": ["sa-----"], "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=uc1.$b23538><em>The river Amazon from its sources to the sea /</em> (1914)", "first_place": "New York :", "first_author_name": "Fountain, Paul.", "first_publisher": "Dodd, Mead,", "marc_record_created": "1985-05-16", "date": 1914, "resource_type": "book", "record_date": 1914}
{"cataloging_source": "d", "scanner": "ia", "date": 1860, "first_author_birth": 1804, "item_date": 1860, "rights_changed_date": "2013-08-10", "serial_killer_guess": "book", "title": "History of the town of Dunbarton, Merrimack County, New-Hampshire, from the grant by Mason's assigns, in 1751, to the year 1860.", "filename": "uc2.ark:/13960/t6m042443", "first_author_name": "Stark, Caleb, 1804-1864.", "contributing_library": "isrlf", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=uc2.ark:/13960/t6m042443><em>History of the town of Dunbarton, Merrimack County, New-Hampshire, from the grant by Mason's assigns, in 1751, to the year 1860.</em> (1860)", "first_author_death": 1864, "cntry": "xx ", "first_place": "Concord :", "first_publisher": "Lyon,", "permalink": "https://babel.hathitrust.org/cgi/pt?id=uc2.ark:/13960/t6m042443", "language": "eng", "government_document": " ", "record_date": 1860, "marc_record_created": "1978-04-19", "resource_type": "book"}
{"lc2": "1129.", "scanner": "google", "lc0": "J", "lc1": "JN", "date": 1894, "item_date": 1894, "rights_changed_date": "2015-02-15", "cataloging_source": "u", "serial_killer_guess": "book", "title": "Pamphlets and leaflets of the Liberal Publication Dept.--- 1894", "filename": "coo.31924093446569", "contributing_library": "Cornell University", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=coo.31924093446569><em>Pamphlets and leaflets of the Liberal Publication Dept.--- 1894</em> (1894)", "cntry": "enk", "first_place": "London :", "lc_class_from_lc": true, "first_publisher": "The Liberal Publication Department.", "permalink": "https://babel.hathitrust.org/cgi/pt?id=coo.31924093446569", "language": "eng", "government_document": "#", "subject_places": ["e-uk---"], "record_date": null, "marc_record_created": "2013-12-09", "resource_type": "serial"}
{"contributing_library": "Princeton University", "item_date": 1886, "permalink": "https://babel.hathitrust.org/cgi/pt?id=njp.32101067014249", "serial_killer_guess": "book", "cataloging_source": "d", "scanner": "google", "language": "ita", "title": "Relazione del direttore generale alla Commissione di vigilanza sul rendiconto dell' amministrazione del debito pubblico ...--- 1885-86", "government_document": " ", "filename": "njp.32101067014249", "cntry": "it ", "rights_changed_date": "2013-10-17", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=njp.32101067014249><em>Relazione del direttore generale alla Commissione di vigilanza sul rendiconto dell' amministrazione del debito pubblico ...--- 1885-86</em> (1886)", "first_place": "Roma :", "date": 1886, "first_publisher": "Tipografia Eredi Botta [etc.]", "marc_record_created": "1990-05-30", "resource_type": "serial", "record_date": null}
{"contributing_library": "Universidad Complutense de Madrid", "item_date": 1612, "permalink": "https://babel.hathitrust.org/cgi/pt?id=ucm.5320302075", "serial_killer_guess": "book", "cataloging_source": "c", "scanner": "google", "language": "lat", "title": "Quaestionum juris tam romani quam saxonici liber primus /", "government_document": " ", "filename": "ucm.5320302075", "cntry": "gw ", "rights_changed_date": "2016-03-31", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=ucm.5320302075><em>Quaestionum juris tam romani quam saxonici liber primus /</em> (1612)", "first_place": "[Haidelbergae] :", "first_author_name": "Pistoris, Hartmann", "first_publisher": "Gotthardi Voegelini,", "marc_record_created": "2008-06-25", "date": 1612, "resource_type": "book", "record_date": 1612}
{"cataloging_source": "#", "scanner": "google", "date": 1867, "first_author_birth": 1802, "item_date": 1867, "rights_changed_date": "2013-08-03", "serial_killer_guess": "book", "title": "Dzieje Rzeczypospolite\u0301j Polskie\u0301j /--- t.4-6 (1849-64)", "filename": "hvd.hnzmgk", "first_author_name": "Moraczewski, Je\u0328drzej, 1802-1855.", "contributing_library": "Harvard University", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=hvd.hnzmgk><em>Dzieje Rzeczypospolite\u0301j Polskie\u0301j /--- t.4-6 (1849-64)</em> (1867)", "first_author_death": 1855, "cntry": "pl#", "first_place": "Poznan\u0301 :", "first_publisher": "N. Kamien\u0301ski,", "permalink": "https://babel.hathitrust.org/cgi/pt?id=hvd.hnzmgk", "language": "pol", "government_document": "#", "subject_places": ["e-pl---"], "record_date": 1862, "marc_record_created": "1986-01-09", "resource_type": "book"}
{"cataloging_source": "d", "scanner": "google", "lc0": "Q", "lc1": "QR", "date": 1886, "first_author_birth": 1842, "item_date": 1886, "rights_changed_date": "2014-10-27", "lc2": "56", "serial_killer_guess": "book", "title": "Les microbes, les ferments et les moisissures.", "filename": "hvd.hc2mbd", "first_author_name": "Trouessart, E.-L. (Edouard-Louis), 1842-1927.", "contributing_library": "Harvard University", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=hvd.hc2mbd><em>Les microbes, les ferments et les moisissures.</em> (1886)", "first_author_death": 1927, "cntry": "fr ", "first_place": "Paris,", "lc_class_from_lc": false, "first_publisher": "Alcan,", "permalink": "https://babel.hathitrust.org/cgi/pt?id=hvd.hc2mbd", "language": "fre", "government_document": " ", "record_date": 1886, "marc_record_created": "1982-06-26", "resource_type": "book"}
{"cataloging_source": "u", "scanner": "cornell-ms", "date": 1892, "first_author_birth": 1852, "item_date": 1892, "rights_changed_date": "2015-07-19", "serial_killer_guess": "book", "title": "The Parsifal of Richard Wagner,", "filename": "coo1.ark:/13960/t2t448b1z", "first_author_name": "Kufferath, M. (Maurice), 1852-1919.", "contributing_library": "Cornell University", "searchstring": "<a href=https://babel.hathitrust.org/cgi/pt?id=coo1.ark:/13960/t2t448b1z><em>The Parsifal of Richard Wagner,</em> (1892)", "first_author_death": 1919, "cntry": "nyu", "first_place": "New York,", "first_publisher": "Tait", "permalink": "https://babel.hathitrust.org/cgi/pt?id=coo1.ark:/13960/t2t448b1z", "language": "eng", "government_document": " ", "record_date": 1892, "marc_record_created": "1974-06-04", "resource_type": "book"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant