Unrecognized Sorbian Speaker #14

PolMine · 2020-01-03T09:30:24Z

There is an unrecognized speech given in Sorbian, see this snippet:

corpus("GERMAPARL") %>%
  subset(date == "2004-06-17") %>%
  subset(speaker == "Maria Michalk") %>%
  read()

ChristophLeonhardt · 2020-10-19T13:18:10Z

I am not sure if the speech isn't recognized. I would say, it is. Maria Michalk does present two speeches here, the first in German, the second (with interruptions and questions in between) in parts in Sorbian.

speeches <- corpus("GERMAPARL") %>%
  subset(date == "2004-06-17") %>%
  subset(speaker == "Maria Michalk") %>%
  as.speeches(s_attribute_name = "speaker")

ablaette · 2020-10-19T17:53:46Z

I fully agree, there are two distinct speeches. However, if you look at the second one (in Sorbian), something is wrong with the html output. This is a polmineR issue rather than a GermaParl issue.

library(polmineR)

speeches <- corpus("GERMAPARL") %>%
  subset(date == "2004-06-17") %>%
  subset(speaker == "Maria Michalk") %>%
  as.speeches(s_attribute_name = "speaker")

html(speeches[[1]])
html(speeches[[2]])

ChristophLeonhardt · 2020-10-19T18:14:00Z

I also noticed these odd tags when doing read(speeches[[1]]), yes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrecognized Sorbian Speaker #14

Unrecognized Sorbian Speaker #14

PolMine commented Jan 3, 2020

ChristophLeonhardt commented Oct 19, 2020

ablaette commented Oct 19, 2020

ChristophLeonhardt commented Oct 19, 2020

Unrecognized Sorbian Speaker #14

Unrecognized Sorbian Speaker #14

Comments

PolMine commented Jan 3, 2020

ChristophLeonhardt commented Oct 19, 2020

ablaette commented Oct 19, 2020

ChristophLeonhardt commented Oct 19, 2020