GitHub - aginglexicon/the-aging-lexicon: resources

The following resources were compiled by participants of the Symposium on the Aging Lexicon that took place in Basel, June 7-9, 2018.

The listing of resources aims to help future efforts to understand adult age differences in the development of lexical and semantic knowledge.

Corpora

Talk Bank: The goal of TalkBank is to foster fundamental research in the study of human communication. It contains a number of diverse speech and text corpora. Some are public and some require contacting TalkBank for permission.
BYU corpora: Collection of free and commercial corpora by Mark Davies in English, Spanish and Portuguese.
Taaluniversum corpora: A variety of Dutch text corpora (e.g. SoNaR) provided by the Dutch Taaluniversum.

Norms

Subtitle norms

SUBTLEX-NL: frequencies based on Dutch subtitles
SUBTLEX-US: frequencies based on American English subtitles:
SUBTLEX-CH: frequencies based on Chinese subtitles:
SUBTLEX-ESP: frequencies based on Spanish subtitles
SUBTLEX-DE: frequencies based on German subtitles
SUBTLEX-GR: frequencies based on Greek subtitles (Dimitropoulou et al., 2010)
SUBTLEX-UK: frequencies based on British English subtitles
SUBTLEX-PL: frequencies based on Polish subtitles

Lexical norms

Age-of-acquisition (AoA) norms for over 50,000 English words
Age-of-acquisition (AoA) and concreteness norms for over 30,000 Dutch words
Affective ratings for nearly 14,000 English words
Affective ratings for over 4,000 Dutch words
MacArthur-Bates Communicative Development Inventories
Chinese lexical database: a new large-scale lexical database for Mandarin Chinese that provides over 150 descriptive and lexical-distributional variables for more tha n 30, 000 words in simplified Chinese.

Word association norms

University of South Florida (USF) Free Association Norms: English word association norms for approx. 5,000 cues.
Small World of Words: Word association norms for over 12,000 cues in English and Dutch.

Concept and category norms

Leuven concept data: norms for over 400 concrete nouns including typicality, similarity within particular domains, category naming data, exemplar generation data, frequency, AoA, etc. -McRae feature norms: Feature norms from McRae, Cree, Seidenberg, & McNorgan (2005)

Semantic vectors:

SNAUT: Interface and access to semantic vectors for Dutch and English based on word2vec
Latent Semantic Analysis: Interface to obtain semantic similarity for words and documents
GloVE vectors: Pretrained word vectors in English. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Repositories

ESPAL: phonology, part-of-speech, subtitle frequencies, etc. in Castillian and Latin American Spanish
Erin Buchanan's word norms: Concept features, LSA and BEAGLE similarity estimates

Behavioral data

Lexicon Projects

Priming studies

Eye-tracking

Provo Corpus: A Large Eye-Tracking Corpus with Predictability Norms
Ghent Eye-Tracking Corpus (GECO): Includes bilingual data
Eye tracking in young and older adults

Imaging data

CMU fMRI dataset: 60 concrete concepts. in 12 categories, collected while nine English speakers were presented with 60 line drawings of objects with text labels and were instructed to think of the same properties of the stimulus object consistently during each presentation. For each concept there are 6 instances of ~20k neural activity features (brain blood oxygenation levels).
Trento EEG data-set for 60 concepts: concepts in 2 categories (work tools and land mammals), collected while seven Italian speakers were silently naming photographic images that represent these concepts. For each concept there are 6 instances of ~15k neural activity features (spectral power in voltage signals).

Tools

Statistical packages

R

tm package: Text Mining in R
NetworkToolBox an R package to analyze brain, cognitive, and psychometric networks
SemNetCleaner: automated R package to clean semantic fluency data

Python

spaCy: Industrial-Strength Natural Language Processing in Python

Citizen-science studies and crowd-source platforms

Prolific Academic: research-focussed crowd-sourcing platform

Other

Executive control/Inhibition measures from an individual differences study with young and old adults: https://osf.io/rygex/
Meta-analysis of aging effects on inhibition tasks [data and analysis script)(https://osf.io/fthku/)
Nun Study
Wuggy non-word generator: Wuggy is a pseudoword generator particularly geared towards making nonwords for psycholinguistic experiments. Wuggy makes pseudowords in Basque, Dutch, English, French, German, Serbian (Cyrillic and Latin), Spanish, and Vietnamese.

References

Overview and commentatory papers, and references to the resources

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

_config.yml

_config.yml

Repository files navigation

Corpora

Norms

Subtitle norms

Lexical norms

Word association norms

Concept and category norms

Semantic vectors:

Repositories

Behavioral data

Lexicon Projects

Priming studies

Eye-tracking

Imaging data

Tools

Statistical packages

R

Python

Citizen-science studies and crowd-source platforms

Other

References

About

Releases

Packages

Contributors 2

aginglexicon/the-aging-lexicon

Folders and files

Latest commit

History

README.md

README.md

_config.yml

_config.yml

Repository files navigation

Corpora

Norms

Subtitle norms

Lexical norms

Word association norms

Concept and category norms

Semantic vectors:

Repositories

Behavioral data

Lexicon Projects

Priming studies

Eye-tracking

Imaging data

Tools

Statistical packages

R

Python

Citizen-science studies and crowd-source platforms

Other

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages