Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to: news metadata -> full text -> pipeline #165

Open
kmcdono2 opened this issue Oct 4, 2022 · 7 comments
Open

How to: news metadata -> full text -> pipeline #165

kmcdono2 opened this issue Oct 4, 2022 · 7 comments
Assignees

Comments

@kmcdono2
Copy link
Collaborator

kmcdono2 commented Oct 4, 2022

Ticket to strategize getting a sample of full text articles into the toponym resolution pipeline.

@kmcdono2 kmcdono2 created this issue from a note in Applications (To do) Oct 4, 2022
@kallewesterling kallewesterling self-assigned this Oct 4, 2022
@kallewesterling
Copy link
Collaborator

kallewesterling commented Oct 4, 2022

Just to start a thread here for myself: This pull request is where @thobson88 has created a workflow for creating fulltext for a given set of items, based on locally downloaded zip files. (Not the best for the pipeline outlined here).

Meanwhile, I'm working on ingesting the fulltext into a separate table. (And then @thobson88's method will have to be used inside a VM for BNA access somehow!)

@kmcdono2
Copy link
Collaborator Author

kmcdono2 commented Oct 5, 2022

@kallewesterling let us know how we can help test this method, when it's useful!

@kallewesterling
Copy link
Collaborator

Thanks, definitely @kmcdono2! We're almost there with the fulltext table as well, which will be available outside of the BNA!

@kmcdono2
Copy link
Collaborator Author

@kallewesterling @lukehare can we add some information here about how this process will work with the new API for toponym resolution?

@kallewesterling
Copy link
Collaborator

Yeah that's a great point. We need to keep this in mind. I assume we could just tag it on outside the db code base (separation of concern, makes the most sense to me) but can also think of ways to build it into the db code (which would be easier for the end user)... food for thought. Let's chat more!

@ruthahnert
Copy link

Would love to chat more about framing narratives

@kmcdono2
Copy link
Collaborator Author

That would be great @ruthahnert! will follow up on slack, but you can see here and in other tickets how we are trying to prep for understanding this as part of working with newspaper full text content

@dcsw2 dcsw2 moved this from To do to To review in Applications Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Applications
To review
Development

No branches or pull requests

3 participants