Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boutta: P => V #22

Open
brendano opened this issue Jun 4, 2013 · 0 comments
Open

boutta: P => V #22

brendano opened this issue Jun 4, 2013 · 0 comments
Assignees

Comments

@brendano
Copy link
Owner

brendano commented Jun 4, 2013

Nathan noticed this today:

"boutta", short for "about to", is currently tagged as P in the 0.3 data release, but it should be V since it's like a modal auxiliary verb, similar to "ought to". In fact, the Brown clusters have figured this out, grouping "boutta" with "tryna", "gonna", and "finna" variants ("trying/going to", "going to", "fixing to"): http://www.ark.cs.cmu.edu/TweetNLP/paths/0011001.html

This might also be related to immediate future auxiliaries as mentioned in the NAACL paper (for "finna" and Texan English).

Current examples of the problem, just for "boutta":

~/twi/pos/ark-tweet-nlp/data/twpos-data-v0.3 % grep -ni boutta *.conll
oct27.conll:22611:boutta P
oct27.conll:26789:Boutta P

Some further inconsistencies. Here are examples of this cluster in the data. I haven't looked at them in context yet but highly doubt the P reading is correct.

daily547.conll:1422 Tryna V
daily547.conll:2499 tryna V
daily547.conll:3934 Bouta P
oct27.conll:1534 fiNna R
oct27.conll:3469 fina V
oct27.conll:3923 gon V
oct27.conll:6065 tryna V
oct27.conll:7890 tryna V
oct27.conll:8455 gne V
oct27.conll:11337 tryna V
oct27.conll:13993 gon V
oct27.conll:19302 finna P
oct27.conll:21114 gon V
oct27.conll:22610 boutta P
oct27.conll:24181 tryna V
oct27.conll:26788 Boutta P

@ghost ghost assigned brendano Jun 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant