Skip to content
/ KEFE Public

Automatic key feature identification based on app user reviews

License

Notifications You must be signed in to change notification settings

GIST-NJU/KEFE

Repository files navigation

KEFE

KEFE is an approach that exploits the information of app description and user reviews (written in Chinese) to identify key features that have a significant relationship with app rating scores.

The application of KEFE involves three main steps: 1) applying a textual pattern-based approach and a deep machine learning classifier to extract features from app description; 2) applying another classifier to match features with their relevant user reviews; and 3) applying regression analysis to identify key features.

More details of KEFE can be found in the following paper:

Huayao Wu, Wenjun Deng, Xintao Niu, and Changhai Nie. Identifying Key Features from App User Reviews. International Conference on Software Engineering (ICSE), pp. 922-932, 2021

Usage

KEFE is developed and tested using Python 3, pyltp and tensorflow. Please install the following packages of specific versions:

pip install pyltp=0.2.1
pip install tensorflow=1.15.0

More instructions for installing pyltp and tensorflow can be found in their respective websites: pyltp, tensorflow.

  1. Download the model files, which include:

    The ltp-model should be put into the pyltp-resource directory, and the other three should be put into the bert-master directory.

  2. To extract feature-describing phrases from a given app description, run:

    python feature_extraction.py -i [app_description].csv
    # for example
    # python feature_extraction.py -i example/description.csv
  3. To identify key features of a given app, run:

    python feature_identification.py -f [features].txt -r [reviews].txt
    # for example
    # python feature_identification.py -f example/alipay_features.txt -r example/alipay_reviews.txt

    The above command will first apply the classification model to match features and user reviews (this will take a long time if there is a large volume of user reviews), and then identify key features. If an existing file of matching between features and user reviews is available, run:

    python feature_identification.py -f [features].txt -m [matching].txt
    # for example
    # python feature_identification.py -f example/alipay_features.txt -m example/alipay_matching.txt

    Format of Files

    • The [reviews].txt file should be organised as the following format per line: [review_text]-*-[review_date]-*-[rating_score]
    • The [matching].txt file should be organised as the following format per line: [feature]-*-[review_text]-*-[review_date]-*-[rating_score]-*-[label], where label = 0 and 1 indicate non-matching and matching pairs, respectively.

Dataset and Replication Package

Dataset (app descriptions and raw user reviews collected) and replication package can be downlowded from the following links:

About

Automatic key feature identification based on app user reviews

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages