Geo-Twitter2019

Description

In this project, we use a novel non-parametric skip-gram model to capture the dialectal changes of English on multiple resolutions. This repository contains the tweets ids we used for training the model. You are free to crawl the data using these ids and preprocess the data using our tools to replicate our research results.

Dataset

Number	USA	UK	Total
tweet	2,075,394	1,088,232	3,163,626
token	41,637,107	22,012,953	63,650,060
term	865,784	469,570	1,167,790

note: CMU geo data only contain 378K tweets

Model

To use our model implementation, you should visit the github page DialectGram. There are four models in the github repository:

baseline models: frequency model and syntactic model
GEODIST model: region-specific embeddings
DialectGram model: a novel approach to compose dialect-sensitive word embeddings, based on Adaptive Skip-gram.

Demo

You can play with our demo on the website: demo

Acknowledgement

We would like to acknowledge the following resources when we implement our models:

Citation

Jiang, Hang*; Haoshen Hong*; Yuxing Chen*; and Vivek Kulkarni. 2019. DialectGram: Automatic Detection of Dialectal Changes with Multi-geographic Resolution Analysis. To appear in Proceedings of the Society for Computation in Linguistics. New Orleans: Linguistic Society of America.

@inproceedings{Jiang:Hong:Chen:2020:SCiL,
  Author = {Jiang, Hang  and  Hong, Haoshen  and  Chen, Yuxing  and  Kulkarni, Vivek},
  Title = {DialectGram: Automatic Detection of Dialectal Changes with Multi-geographic Resolution Analysis},
  Booktitle = {Proceedings of the Society for Computation in Linguistics},
  Location = {New Orleans},
  Publisher = {Linguistic Society of America},
  Address = {Washington, D.C.},
  Year = {2020}}

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data_collection		data_collection
demo		demo
preprocessing		preprocessing
tweets2019		tweets2019
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_collection

data_collection

demo

demo

preprocessing

preprocessing

tweets2019

tweets2019

README.md

README.md

_config.yml

_config.yml

Repository files navigation

Geo-Twitter2019

Description

Dataset

Model

Demo

Acknowledgement

Citation

About

Releases

Packages

Languages

hjian42/Geo-Twitter2019

Folders and files

Latest commit

History

Repository files navigation

Geo-Twitter2019

Description

Dataset

Model

Demo

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages