Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized Sorbian Speaker #14

Open
PolMine opened this issue Jan 3, 2020 · 3 comments
Open

Unrecognized Sorbian Speaker #14

PolMine opened this issue Jan 3, 2020 · 3 comments

Comments

@PolMine
Copy link
Collaborator

PolMine commented Jan 3, 2020

There is an unrecognized speech given in Sorbian, see this snippet:

corpus("GERMAPARL") %>%
  subset(date == "2004-06-17") %>%
  subset(speaker == "Maria Michalk") %>%
  read()
@ChristophLeonhardt
Copy link

I am not sure if the speech isn't recognized. I would say, it is. Maria Michalk does present two speeches here, the first in German, the second (with interruptions and questions in between) in parts in Sorbian.

speeches <- corpus("GERMAPARL") %>%
  subset(date == "2004-06-17") %>%
  subset(speaker == "Maria Michalk") %>%
  as.speeches(s_attribute_name = "speaker")

@ablaette
Copy link
Collaborator

I fully agree, there are two distinct speeches. However, if you look at the second one (in Sorbian), something is wrong with the html output. This is a polmineR issue rather than a GermaParl issue.

library(polmineR)

speeches <- corpus("GERMAPARL") %>%
  subset(date == "2004-06-17") %>%
  subset(speaker == "Maria Michalk") %>%
  as.speeches(s_attribute_name = "speaker")

html(speeches[[1]])
html(speeches[[2]])

@ChristophLeonhardt
Copy link

I also noticed these odd tags when doing read(speeches[[1]]), yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants