Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umbrella ticket: Output Format and Exploration #164

Open
dcsw2 opened this issue Oct 4, 2022 · 6 comments
Open

Umbrella ticket: Output Format and Exploration #164

dcsw2 opened this issue Oct 4, 2022 · 6 comments
Assignees

Comments

@dcsw2
Copy link
Collaborator

dcsw2 commented Oct 4, 2022

Add requests here for research requirements

@dcsw2 dcsw2 created this issue from a note in Applications (To do) Oct 4, 2022
@kmcdono2 kmcdono2 moved this from To do to In progress in Applications Oct 4, 2022
@kmcdono2
Copy link
Collaborator

kmcdono2 commented Oct 5, 2022

Two core approaches to working with the toponyms:

  1. analyzing toponyms independently --> see discussion of this as viz in Set up Observable notebook for data exploration #166
    This could provide sample-level information such as:
  • average #/min/max of toponyms per article and per title
  • spatial "center" of resolved toponyms (e.g. where do they cluster) per article and per title
  • how many toponyms of each type (location, building, street, etc.) in total sample

And, for every resolved and unresolved toponym in the corpus/sample:

  1. analyzing toponyms dependent on their linguistic context
    Basic format would allow us to analyze toponyms within the context of other words in the article. Literally what format this should be (e.g. XML) is what I'm not sure about. For the paper we are working on, this would ideally include (not a complete list, but a start):

@kmcdono2 kmcdono2 self-assigned this Oct 5, 2022
@kallewesterling
Copy link
Collaborator

I'd be happy to talk through the format question, if you want, @kmcdono2!

@dcsw2
Copy link
Collaborator Author

dcsw2 commented Oct 7, 2022

I sent an invite

@kallewesterling
Copy link
Collaborator

Thank you @dcsw2 - looking forward to chatting!

@kmcdono2
Copy link
Collaborator

Decision to divide this ticket into separate tasks. #166 for the items under (1). Next week we will discuss how to prepare for (2).

@fedenanni
Copy link
Contributor

Hi all - I am picking up the "access to POS tags" and will keep track of progress in here

@kmcdono2 kmcdono2 changed the title Output Format Umbrella ticket: Output Format Nov 15, 2022
@kmcdono2 kmcdono2 changed the title Umbrella ticket: Output Format Umbrella ticket: Output Format and Exploration Nov 15, 2022
@dcsw2 dcsw2 moved this from In progress to To review in Applications Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Applications
To review
Development

No branches or pull requests

4 participants