Skip to content

tasukuigarashi/j-liwc2015

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

J-LIWC2015

Overview

This repository explains how to analyze Japanese texts by J-LIWC2015. J-LIWC2015 is a Japanese-translated version of the LIWC2015 dictionary (Pennebaker, Booth, Jordan, & Blackburn, 2015). Psychometric properties of the J-LIWC2015 dictionary are introduced in Igarashi, Okuda, & Sasahara (2022).

Unlike English, there is no word boundary in a sentence in Japanese. It is mandatory for natural language processing to segment a Japanese text document into words before the main analysis. This repository provides basic instructions for using J-LIWC2015 and sample scripts for preprocessing texts for word segmentation.

Prerequisites

Make sure you have installed/downloaded all of the following prerequisites in your development environment.

  1. MeCab/IPADIC (instructions in Japanese)
    • morphological analysis (word segmentation) and part of speech analysis (POS tagging)
  2. Japanese_Dictionary.dic
  3. user_dict.dic (in this repository)
    • user dictionary for MeCab/IPADIC

Steps to use J-LIWC2015

  1. Preprocessing: Analyze Japanese text file(s) by MeCab/IPADIC with the user dictionary (user_dict.dic) for word segmentation and POS tagging
  2. Main analysis: Use the LIWC2015 software or other natural language processing libraries for category-by-category word frequency analysis
  3. Postprocessing (optional): Add POS tagging information to the output of the LIWC2015 software

Sample scripts are available at Google Colab.

Standard analysis: Python + LIWC software

  1. Preprocessing: Analyze a Japanese text (e.g., sample.txt) by MeCab/IPADIC with user_dict.dic in Python. In the example in Colab, two output files are generated:
    • wakachi.txt: word segmentation output
    • pos_rate.txt: POS tagging output
  2. Main analysis: Run the LIWC2015 software, click DictionaryLoad New Dictionary, and choose J-LIWC2015.dic. Then analyze wakachi.txt (word segmentation output). An output file is:
    • LIWC2015 Results (wakachi).txt: word frequency analysis output
  3. Postprocessing (optional): Combine LIWC2015 Results (wakachi).txt (word frequency analysis output) with pos_rate.txt (POS tagging output) so that users can obtain more detailed information about POS of the text (see the example in Colab). A merged output file is:
    • result.txt: word frequency and POS tagging output

Advanced analysis: Not using LIWC software

Non-commercial users of the LIWC2015 software can use J-LIWC2015.dic in other programming languages (Python, R, etc.) with MeCab/IPADIC. This means that the users can seamlessly integrate the findings of preprocessing, main analysis, and postprocessing (optional) (see the example in Colab). Note that the developer does not formally support the use of the dictionary outside of the LIWC2015 software (please do it at your own risk). The compatibility of the outputs generated by and outside of the LIWC2015 software is also not guaranteed.

Notes

Any request for the distribution of the dictionary file (J-LIWC2015.dic) is not accepted. Questions about the commercial use of J-LIWC2015 should be directed to Receptiviti.

Reference

Igarashi, T., Okuda, S., & Sasahara, K. (2022). Development of the Japanese Version of the Linguistic Inquiry and Word Count Dictionary 2015. Frontiers in Psychology, 13:841534. https://doi.org/10.3389/fpsyg.2022.841534

Licence

MIT