Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Experimental Sample 1 #178

Open
dcsw2 opened this issue Feb 21, 2023 · 6 comments
Open

Process Experimental Sample 1 #178

dcsw2 opened this issue Feb 21, 2023 · 6 comments
Assignees

Comments

@dcsw2
Copy link
Collaborator

dcsw2 commented Feb 21, 2023

21 Feb: a first sample to run through T-RES

@dcsw2 dcsw2 created this issue from a note in Applications (To do) Feb 21, 2023
@dcsw2 dcsw2 changed the title Process Experimental Sample Process Experimental Sample 1 Feb 21, 2023
@dcsw2
Copy link
Collaborator Author

dcsw2 commented Feb 21, 2023

SAMPLE REQUEST 1

-HMD+LWM collections only
-Date range: 1880-1900
-for every title, take 7 random days per year; this gives 7 issues. For each issue include all articles, retaining metadata about issues, e.g. we want to know that articles belong to issues)
-all OCR qualities

NB: the objects of inquiry are both article and issue, so it's important to select content within 7 issues

Is below the right set of tasks? Please amend as needed!

@kmcdono2

This comment was marked as resolved.

@npedrazzini
Copy link
Collaborator

Sounds good @dcsw2 , I can do that. I can start working on it late this afternoon... if I start a script tonight you might have the sample sometime tomorrow. I'll keep you updated but ping me for anything else in the meantime - I'll be a bit busy with last-minute abstract writing and wrapping up stuff before I switch to part-time next week, but it's on my TODO for the day ✅

@kmcdono2
Copy link
Collaborator

T-Res output + article metadata fields:

NLP,issue,art_num,title,collection,full_date,year,month,day,location,word_count,ocrquality,decade, mention, candidates, candidate_names, sent_idx, end_pos, tag, sentence, prediction, prediction_name, ed_score, latlong, wkdt_class
*Including toponym mentions that return NIL candidates
*Amended to leave out POS until @dcsw2 and I discuss

@kmcdono2
Copy link
Collaborator

kmcdono2 commented Feb 24, 2023

Sounds good @dcsw2 , I can do that. I can start working on it late this afternoon..

Sample in google drive here: https://drive.google.com/drive/folders/1GCQJXT2ZI_EtGgHQeqOyn6TYe4Ww7lQI

Sample stored in azure here: storageexplorer://v=1&accountid=%2Fsubscriptions%2Fb8871872-a5e3-473f-b9b9-f4baaab6a9a0%2FresourceGroups%2Flivingwithmachines%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Flivingwithmachines&subscriptionid=b8871872-a5e3-473f-b9b9-f4baaab6a9a0&resourcetype=Azure.BlobContainer&resourcename=topo

@kmcdono2
Copy link
Collaborator

(just leaving @fedenanni and @lukehare assigned as they are active on this right now) - @fedenanni when you're ready for @dcsw2 and I to review, just re-assign us! I'm trying to get better at this ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Applications
In progress
Development

No branches or pull requests

5 participants